NICAR13 Day 2, Friday 1/3

Bringing Local Geodata to Journalism – Ate Poorthuis, Matt Zook
Ever since December 2011 has consumed and indexed every geotagged tweet produced (about 3% of all tweets are geotagged) in the world using elasticsearch and Twitter’s streaming API. Unfortunately Floatingsheep is not openly available to the public. Ate Poorthuis and Matt Zook from University of Kentucky demoed some of Floatingsheep’s awesome capabilities, like locating the epicenter of an earthquake(!).

Data visualization on a shoestring – Sharon Machlis, Kevin Hirten
What can you do if you are on a small budget, or even, no budget at all?
Sharon and Kevin pelleted the audience with free (as in free beer) tools:

Sharon’s chart:

Smarter interactive Web projects with Google Spreadsheets and Tabletop.js – Tasneem Raja
Tasneem Raja at Mother Jones sees everything that reporters produce as data – “Everything is data, and since everything is data it can have structure”. At Mother Jones they have build their own cms on Drupal and Google spreadsheets. Reporters feed data into spreadsheets and information is extracted into the browser using Tabletop.js. Tasneem pointed out a few caveats. Google limits access and threasholds aren’t clear. And the solution is depending on Google not changing its API. The solution manages to run on a single private Google account though.

D3? R? Tableau? What’s right for you? – Amanda Cox, Robert Kosara
Having no particular experience with any of the tools, my impression is that D3, R, and Tableau each solves different problems. What caught my interest the most was the D3 Javascript library (here’s one example: the Waterman Butterfly Map). Because D3 uses SVG it will not work (out of the box) with Internet Explorer below version 9. Another Javascript library mentioned was numeric.js that can work with matrices and vectors.

How to serve mad traffic – Jeremy Bowers, Jacqui Maher
This session was hilarious! With a great sense of humor, Jeremy explained the three virtues of a great sysadmin: lazy, impatient, and proud. At they use Ruby on Rails and nginx on Amazon S3. By putting their systems in the cloud they can tailor it to the traffic in a flexible way. But despite your best efforts your loadbalancer might melt. Things like Ajax polling might cause unexpected load.

A few pointers on serving mad traffic:

  • You need to know the path that each request travels.
  • And if each request requires an application server, it won’t scale.
  • There are only two hard things in computer science: cache invalidation and naming things – Phil Karlton
  • nginx! 100k+ req/sec
  • no db + dynamic == easy to scale
  • Scale up: more servers
  • Use consitend libs for live polling (js).
  • Sanity check data entry/ delivery points.
  • Plan to degrade gracefully at risky areas.
  • review, review, review
  • Don’t bypass caches.
  • Don’t request mbs of json every 30s!
  • Turn off keep-alive.
  • Turn off gzip.

I asked Jacqui and Jeremy how maps are served by Nytimes and apparently they use a tool called Tilemill. Gotta check that up…

Lightning talks
5 minutes enlightening lightning talks for about an hour. Fun and intelligent. I’m truly awed and impressed by the performances of the people on stage. As a data nerd and hardware hacker I found Matt Waite’s Arduino and Nintendo Wii hack particularly inspiring. Using an accelerometer (harvested from a Wii remote control) connected to a programmable microcontroller, they built a data gathering device which they checked in with a bag at an airport to track TSA’s (mis)handling. I hope this example inspires more people to go hacking with hardware and programming – because it is true: with programming you can control robots!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s