"Infographics and data visualization are two different things"
Your definition for “infographics”
Aaah, I still find that difficult. To me infographics and data visualization are two different things. I see myself as someone who creates data visualizations, where all of the visual is build up out of the data itself. There are no added graphical elements. In a very generalized kind of way: I create a csv or json of the data and that is all I use to create visual elements on the screen, often in an interactive way. Infographics on the other hand, to me, are static, more poster like visualizations. They often combine graphical elements, such as a drawn portion of an animal, human, map, etc. with small mini data visualizations (a small bar chart for example) and annotations to tell a story. Infographics are something I would more likely expect in a printed magazine, where I can enjoy the details in good quality. Whereas I think a data visualization is more often found solely on the web.
Which are your obligatory references?
You mean my favorite resources? That have helped me in the past? In that case, the book by Scott Murray called “Interactive data visualization for the Web” has been the biggest help in getting me started with d3.js, which is the main “tool” I use to visualize data. Although I also do a lot of data preparation and preliminary simple charts in R. I also very much enjoyed reading “The functional art” by Alberto Cairo when I was just learning about data visualization (even before d3.js) and sometime later the books by Edward Tufte and the first half of Colin Ware’s “Information Visualization”. I know these books are on many “best dataviz lists”, but I guess they’re there for a reason 🙂 In terms of online resources I often browser d3’s “blocks” through http://blockbuilder.
What are we going to find in your speech in Malofiej?
I’ll be talking about a year-long collaboration that I did and the lessons that I learned from making an extensive interactive data visualization each month. In general most lessons are aimed at giving the audience some ideas about how to go beyond the default in the visualization of data. For example, how to let your design be guided by the data itself, how a good grasp of math opens up worlds of possibilities to create data based shapes and, how adding some seemingly trivial additions to your design can increase the engagement of the viewer. But in general, you’ll be seeing lots and lots of images of the design process, lots of interactive examples and very few words on my slides.
Your work process
I actually write about the design process of each of the (currently 11) data visualizations I’ve made for data sketches in a very detailed manner on the data sketches website (if you click on the “Read more” buttons) http://www.
I first need to know what the goal is of my visualization. What should people learn after having seen the dataviz? For this particular piece, my main goal was to show people what the most popular decade was in terms of release year.
I can then use that goal to figure out what data I need to hopefully answer the question (and what data might be interesting to add as context). In this case the the Top 2000 website shares a file with the name, artist, and year of release for each song. However, I also wanted to know the highest rank ever reached in the weekly Top 40. I therefore wrote a small web scraper in R that saved the artist’s name, song title, and position for all weeks since 1965.
Having gathered the data (and having browsed it quickly to get a sense of what the values are) I start sketching. And I often do this on plain paper (I always carry a small Moleskin with me and a pen so I can make quick sketches where ever I am.) I focus on the main abstract shape that I want to fit my data into. I don’t care about details, layout and colors necessarily. It’s more about “do I want to visualize the songs as circles, or something more complex? How are they positioned on the screen? By rank? By release year? Both?”) The sketch below is much more detailed (and bigger) than I typically do, because I developed this during a workshop.
Once I have a rough sketch I’ll try to recreate it through code (d3.js) and with the actual data as soon as possible, to see if the data actually works with my idea (sometimes it doesn’t at all, e.g. due to outliers, or bunching up in 1 place). For example, following the original idea of using the release year to place the circles (songs), then color for the rank in the top 2000, and size for the highest position in the weekly top 40 resulted in this. Which I wasn’t quite happy with. I therefore tried to switch the variables for size and color…
… and ended up with this:
Which I felt worked much better. Once I see that the visual on my screen is indeed showing potential, I start thinking about how to add in extra details, extra context, such as showing all the songs of recently passed away singers (David Bowie and Prince at that time) by giving their songs a certain colored stroke. Or by marking the top 10 songs by making them look like tony vinyl records.
Typically I end with the interactive online version. However, here I saved the SVG from my browser and continued with it in Illustrator to turn it into a static poster. I created a few histograms in R about the distribution across release years from previous editions…
and combined it all, with annotations, into the final poster.