NICAR13 Day 4, Sunday 3/3

This conference isn’t over yet! Just one more session: Mapping best practices. Up on the panel was John Keefe, Matt Stiles, and Eric Gundersen. The very first question – is a map necessary? Don’t do maps just for the sake of doing maps. And beware of maps that are basically population maps.

Some people are color blind, and there are tools to help you select colors that work for them, like Colorbrewer2 and Colororacle.

In order to tell a story, there must be a balance between exploratory control and controlled narrative. Finding this balance is key. Too much exploratory control and people get lost. Too much controlled narrative and the story ends up as an ordinary article (I guess).

When serving maps: pregenerate data – high and low resolution – in order to serve the material efficently for both browser and server. Polygons can be simplified at higher level. Use a tileserver, like TileMill for Mapbox

Someone in the audience raised the question what the panel thought about making data public. They seemed to agree that making data public makes sense in most cases. For example, by making data public, people can help correct it.

It seems there’s no single tool that can solve all mapping problems, instead there are a rather large box of tools that can help solve a lots of different problems.

The slides.

NICAR13 Day 3, Saturday 2/3

bildIt’s the last day of the NICAR13 conference. Today I’ve been watching Matt Waite tell the story about the Pullitzer prize winning site Politifact. Matt was very keen on structure. Because everything has structure, especially stories. If you can find the structure and think of it on a higher level, you can build systems (like Politifact). Another aspect of building something that overlaps journalism and IT is cultural resistance. Freaked out reporters and reluctant developers, not to mention clueless management. “Build shit, don’t talk shit” – i.e build a prototype to have conversations around. “Your mission might be a very small defined thing”, says Matt. If you can describe your thing with a single short declarative sentence, then you have a chance – you can pitch things. Guard your ONE THING zealously. Having a structure makes it possible to say NO to things. The core question: What is the atomic unit of this?

Another sesson I went to was on how to develop reusable visualization components using D3 and Backbone, with Alastair Dant who works with development at the Guardian. The Javascript library D3 really stands out as everyone’s favorite at this conference. Alastair is a fun guy, and it was a joy listening to him. I could feel part of the audience zoning out (this isn’t primarily a developer conference after all) during his walkthrough of the code. I truly enjoyed it though. The example code can be found on github. Also check out R2D3 if you are required to support IE 7 and 8.

The “Swedish Contingent” at the conference had booked a lunch session. To be honest, this was nothing I was looking forward to in particular. I think the organizers of this conference put in a 2 hour lunch break for a reason. But I was very happy to see Matt Waite again. This time flying around with a microscopic quadrotor drone. And I learned that there’s a drone journalism lab somewhere at the University of Nebraska. How satisfying for the nerd in me to hear Matt speak about hardware hacking, Arduino programming, drones and mesh networks. Where is journalism going? ūüėČ

People have shown amazing stuff here, and we can all do amazing stuff back home – by crossbreeding ideas and competences. And a little bit of coding ūüėČ

NICAR13 Day 2, Friday 1/3

Bringing Local Geodata to Journalism – Ate Poorthuis, Matt Zook
Ever since December 2011 has consumed and indexed every geotagged tweet produced (about 3% of all tweets are geotagged) in the world using elasticsearch and Twitter’s streaming API. Unfortunately Floatingsheep is not openly available to the public.¬†Ate Poorthuis and Matt Zook from University of Kentucky demoed some of Floatingsheep’s awesome capabilities, like locating the epicenter of an earthquake(!).

Data visualization on a shoestring – Sharon Machlis, Kevin Hirten
What can you do if you are on a small budget, or even, no budget at all?
Sharon and Kevin pelleted the audience with free (as in free beer) tools:

Sharon’s chart:

Smarter interactive Web projects with Google Spreadsheets and Tabletop.js – Tasneem Raja
Tasneem Raja at Mother Jones sees everything that reporters produce as data – “Everything is data, and since everything is data it can have structure”.¬†At Mother Jones they have build their own cms on Drupal and Google spreadsheets. Reporters feed data into spreadsheets and information is extracted into the browser using Tabletop.js. Tasneem pointed out a few caveats. Google limits access and threasholds¬†aren’t clear. And the solution is depending on Google not changing its API. The solution manages to run on a single private Google account though.

D3? R? Tableau? What’s right for you? – Amanda Cox, Robert Kosara
Having no particular experience with any of the tools, my impression is that D3, R, and Tableau each solves different problems. What caught my interest the most was the D3 Javascript library (here’s one example: the Waterman Butterfly¬†Map). Because D3 uses SVG it will not work (out of the box) with Internet Explorer below version 9. Another Javascript library mentioned was¬†numeric.js that can work with matrices and vectors.

How to serve mad traffic – Jeremy Bowers, Jacqui Maher
This session was hilarious! With a great sense of humor, Jeremy explained the three virtues of a great sysadmin: lazy, impatient, and proud. At they use Ruby on Rails and nginx on Amazon S3. By putting their systems in the cloud they can tailor it to the traffic in a flexible way. But despite your best efforts your loadbalancer might melt. Things like Ajax polling might cause unexpected load.

A few pointers on serving mad traffic:

  • You need to know the path that each request travels.
  • And if each request requires an application server, it won’t scale.
  • There are only two hard things in computer science: cache invalidation and naming things – Phil Karlton
  • nginx! 100k+ req/sec
  • no db + dynamic == easy to scale
  • Scale up: more servers
  • Use consitend libs for live polling (js).
  • Sanity check data entry/ delivery points.
  • Plan to degrade gracefully at risky areas.
  • review, review, review
  • Don’t bypass caches.
  • Don’t request mbs of json every 30s!
  • Turn off keep-alive.
  • Turn off gzip.

I asked Jacqui and Jeremy how maps are served by Nytimes and apparently they use a tool called Tilemill. Gotta check that up…

Lightning talks
5 minutes enlightening lightning talks for about an hour. Fun and intelligent. I’m truly awed and impressed by the performances of the people on stage. As a data nerd and hardware hacker I found Matt Waite’s Arduino and Nintendo Wii hack particularly¬†inspiring. Using an accelerometer (harvested from a Wii remote control) connected to a programmable microcontroller, they built a data gathering device which they checked in with a bag at an airport to track TSA’s (mis)handling. I hope this example inspires more people to go hacking with hardware and programming – because it is true: with programming you can control robots!

NICAR13 Day 1, Thursday 28/2

I’m at the Computer Aided Reporting conference in Louisville Kentucky. Here’s my summary of day one:

Information design and crossing the digital divide – Christopher Canipe, Helene Sears
What inspires me is hearing stories from people who took on the challenge to do something new – like Christopher Canipe who moved from paper to web and had to learn about programming and Javascript. Helene Sears told the story about how graphical work is done at BBC. What I liked the most was the James Bond parallax scrolling infographics.

Prediction is very difficult, especially about the future – Andy Cox
Part of this session (like weather predictions and forecasts) went completely over my head, but I got inspired to try out the d3 Javascript library.

Down and dirty with the DocumentCloud API – Ted Han
DocumentCloud is a service that turns documents into searchable and analyzable data. It seems pretty useful with its API and scripting abilities. I wonder if there’s limitations with foreign languages like Swedish?

Dig deeper with social tools – Mandy Jenkins, Doug Haddix
Mandy and Doug went through an amazing array of useful social web tools. Go check them out for yourself:,,,,,,,,,,

Practical machine learning: Tips, tricks and real-world examples for using machine learning in the newsroom – Jeff Larson, Chase Davis
As a data nerd, this was the most exciting session of the day. Jeff and Chase showed different techniques to create decision trees and other machine learning stuff, and pointed out the tool Weka to use for exploring such. Jeff and Chase have been kind enough to put their code on github:

Visualizing networks and connections – Irene Liu, Kevin Connor may perhaps be described as a crowdsourced Facebook about the power elite, where the dots are connected between the ultra rich and those in power, and how they connect to organizations. Another similar website is Irene Liu made the perhaps boldest presentation so far –¬†went live just half an hour before her presentation(!). A very impressive and thorough¬†html5 app on China’s power elite. A thing I learned about China is that although they have only one political party, it is highly fractioned. Very interesting. My guess is that the site will probably be censored in China.

Goodnight and see you tomorrow!