Bringing Local Geodata to Journalism – Ate Poorthuis, Matt Zook
Ever since December 2011 Floatingsheep.org has consumed and indexed every geotagged tweet produced (about 3% of all tweets are geotagged) in the world using elasticsearch and Twitter’s streaming API. Unfortunately Floatingsheep is not openly available to the public. Ate Poorthuis and Matt Zook from University of Kentucky demoed some of Floatingsheep’s awesome capabilities, like locating the epicenter of an earthquake(!).
Data visualization on a shoestring – Sharon Machlis, Kevin Hirten
What can you do if you are on a small budget, or even, no budget at all?
Sharon and Kevin pelleted the audience with free (as in free beer) tools:
- Open Refine – a tool for working with messy data
- Stata – data analysis and statistics
- IBM many eyes – data visualization
- Tableau Public – analytics and visualization
- infogr.am – infographics
- Datawrapper – embedded charts
- d3 – awesome data driven documents
- Exhibit – publishing framework
- Treecheets – Cascading Tree Sheets, an attempt to separate design HTML from content HTML.
- Google Chart Tools
- Misoproject – interactive storytelling
- Google Fusion Tables
- Esri – mapping for everyone
- Qgis – opensource GIS program
- R – statistical computing
- Statwing – visualization
- Kartograph – mapping framework build on JQuery and Raphael
Smarter interactive Web projects with Google Spreadsheets and Tabletop.js – Tasneem Raja
Tasneem Raja at Mother Jones sees everything that reporters produce as data – “Everything is data, and since everything is data it can have structure”. At Mother Jones they have build their own cms on Drupal and Google spreadsheets. Reporters feed data into spreadsheets and information is extracted into the browser using Tabletop.js. Tasneem pointed out a few caveats. Google limits access and threasholds aren’t clear. And the solution is depending on Google not changing its API. The solution manages to run on a single private Google account though.
D3? R? Tableau? What’s right for you? – Amanda Cox, Robert Kosara
How to serve mad traffic – Jeremy Bowers, Jacqui Maher
This session was hilarious! With a great sense of humor, Jeremy explained the three virtues of a great sysadmin: lazy, impatient, and proud. At Nytimes.com they use Ruby on Rails and nginx on Amazon S3. By putting their systems in the cloud they can tailor it to the traffic in a flexible way. But despite your best efforts your loadbalancer might melt. Things like Ajax polling might cause unexpected load.
A few pointers on serving mad traffic:
- You need to know the path that each request travels.
- And if each request requires an application server, it won’t scale.
- There are only two hard things in computer science: cache invalidation and naming things – Phil Karlton
- nginx! 100k+ req/sec
- no db + dynamic == easy to scale
- Scale up: more servers
- Use consitend libs for live polling (js).
- Sanity check data entry/ delivery points.
- Plan to degrade gracefully at risky areas.
- review, review, review
- Don’t bypass caches.
- Don’t request mbs of json every 30s!
- Turn off keep-alive.
- Turn off gzip.
5 minutes enlightening lightning talks for about an hour. Fun and intelligent. I’m truly awed and impressed by the performances of the people on stage. As a data nerd and hardware hacker I found Matt Waite’s Arduino and Nintendo Wii hack particularly inspiring. Using an accelerometer (harvested from a Wii remote control) connected to a programmable microcontroller, they built a data gathering device which they checked in with a bag at an airport to track TSA’s (mis)handling. I hope this example inspires more people to go hacking with hardware and programming – because it is true: with programming you can control robots!