Screen Shot 2018-02-27 at 12.36.35

I fell asleep on a train with an empty head. The train made a stop and I woke up, my head filled with an idea and some kind of mad inspiration to follow it through: it was time to move the home metrics stuff out of the old netbook and into the cloud.

The 1.0 setup has served well for over 3 years now, without hardly any interventions. In fact, this all started some 5 years ago(!). TL;DR I’m sending sensor metrics using Node.js to a dashboard based on the time series database Graphite. Kill your darlings and welcome to the age of cloud!

Hosted Graphite packs Graphite with Grafana to visualise metrics stored in the time series database. Hosted Graphite is available as a Heroku add-on which makes this switch a very smooth and convenient cloud experience.


App names blurred to make it look more interesting.

I created 3 apps on Heroku: one to hold the Hosted Graphite add-on and two others to serve up metrics from Eliq and Telldus respectively. With Hosted Graphite you need to prepend an API key when sending metrics to Graphite, and Heroku (being compliant) uses environment variables,  so I had to update eliq2graphite and telldus2graphite and their dependencies to accommodate this.

Deployment to Heroku is just a matter of pushing code from git:

# setup
heroku git:remote -a <app-name>
# deploy
git push heroku master

The scripts are executed periodically using the Heroku Scheduler add-on:

Screen Shot 2018-02-20 at 15.22.39

Logs can be read with the “heroku logs” command:

 heroku logs -a <app-name>
heroku[scheduler.3876]: Starting process with command `node bin/telldus2graphite.js`
heroku[scheduler.3876]: State changed from starting to up
app[scheduler.3876]: Logged: ["home.chili.temp 10.9 1519109556\n"]
app[scheduler.3876]: Logged: ["home.outdoor.temp -4.7 1519109628\n","home.outdoor.humidity 83 1519109628\n"]
app[scheduler.3876]: Logged: ["home.rain.rrate 0 1519109525\n","home.rain.rtot 206.3 1519109525\n"]
app[scheduler.3876]: Logged: ["home.uv.uv 0 1519109343\n"]
app[scheduler.3876]: Logged: ["home.wind.wdir 292.5 1519109799\n","home.wind.wavg 0 1519109799\n","home.wind.wgust 0 1519109799\n"]
heroku[scheduler.3876]: Process exited with status 0
heroku[scheduler.3876]: State changed from up to complete


Grafana is easy to work with and I’m using single stats and graphs to visualise my metrics.

This slideshow requires JavaScript.

Value mapping, colored thresholds, and units are some of Grafana’s many features.


Oh, I forgot to mention, this setup is completely free. Free as in free 🍺 #winning

Screen Shot 2018-02-20 at 15.24.08

Update: Actually it’s $29/mo for Grafana after the free trial 😬

To be continued 😀 …


QCon London 2016, Mars 7-9

2016-03-07 13.01.14.png
View from the venue
Of three days of conferencing I probably did best on the second day when I felt all ears for most of the day. First day it was hard to sit down and third day I found myself zoning out on a few occasions (mind I was early in bed the night before). With six simultaneous tracks you are going to miss more than 80% of the conference anyway. Luckily they video all the things. Three days of information overload pass by rather quickly and in the end things tend to blur into each other. I feel I need to sort things out for myself. So here it goes.
Adrian Colyer was first out as introduction key note speaker. I think he has a point in that reading research papers is a good habit, and his blog seems to be a good starting point.
2016-03-09 13.48.01.jpg
Matteo Collina explaining IoT on a high level
In one sentence QCon London 2016 can be summarized as “Microservices, microservices, blah, blah, microservices”. And the three letter acronym “iot” can sure be fitted into most sentences, especially true among the many vendors at the venue.
2016-03-08 12.03.48.jpg
Nice visualization of Netflix’s system
After attending a few microservice sessions, some more enlightening than others, I started to discern a common pattern. It goes something like this: “Before we had a monolith running on-premise, then we virtualized the monolith, and then we decomposed it further, and somewhere on the way we decided to call our architecture a microservice one. And now we see these problems.” And then the session ends.
Quite annoying when the speaker never get’s to the “good stuff”. However, there was pieces of good stuff to be found in most talks. Some of the common learnings seem to be that things ARE going to fail, and whatever you do, things are going to fail anyway. If it aint’s broken, try harder. The motto for yesteryears was “embrace change”, now we’re supposed to “embrace failure”
Testing in production is now a thing. I imagine testers cringe when hearing this. Apparently Netflix is doing this large scale with their Chaos Monkey and Kong tools, and has been doing it for quite some time, starting somewhere after that infamous outage christmas eve 2012. They call it failure drive architecture. Given a system is sufficiently complex things are bound to go wrong and we cannot completely test our way to quality (if that was ever an option). Maybe fault tolerance and confidence are more useful words.
Rather than walking away with a bunch of handy solutions I walked away with a set of new questions. On a basic level, how do we even comprehend a system made of many small parts?
As Katherine Kirk concluded, the 3 characteristics for human existence is also true for our business:
  • Everything is in a constant state of change
  • We need to collaborate
  • We will always battle dissatisfaction
New innovations in hardware will impact how we build systems: persistent RAM will fundamentally change how we database stuff, and hosts being able to access RAM on other hosts over super low latency network channels without consuming CPU will impact what’s possible to build.
2016-03-07 16.50.36.jpg
Embrace the (distributed commit) log
Keeping stuff in sync is a thing, and even more so when things are distributed. Martin Kleppmann in his session “Staying in Sync: From Transactions to Streams” proposed a stupid simple solution, as he put it, as an alternative to distributed transactions by using a distributed commit log.
Another trending topic is blockchains and not only in the scope of crypto currency. Chris Anderson had an interesting talk about per document non global consensus blockchains. It took me about 50 minutes into the session (the very end) before I could conjure up a use case for this in my head; to me this just sounds like the perfect thing for tradable items in games. Today and always, supply has to be managed within the system since otherwise digital content is easily copied and multiplied and items gets devalued. With this technology items could be traded outside the system, call it p2p DRM. Just an idea. Could be a huge thing. We’ll see. There seems to be a common notion that blockchains will eventually disrupt the current financial system as we know it, and in that light it’s a small irony that the company holding 10% of all bitcoins has locked its private keys, split into many parts and encrypted by many key holders, into good old bank vaults 🙂
2016-03-07 15.10.42.jpg
(well, I am)
Glen Ford talked about culture and the perhaps not so obvious danger of trying to copy someone else’s culture. Culture is a complex matter and to some degree culture is a differentiator, something unique that is separating “us” from “them”, and on the bottom line makes the business successful. Looking at contemporary recruitment posters it’s clear that “culture” is a currency. And like most fiat currencies, cultures turn out to be hard to copy.
John Willis talked about burnout, and it’s a thing not only in society but in our business. Turns out software people tend to be more receptible than the general population, especially the high achievers. A slippery slope that may go unnoticed until too late. Regular self assessment to monitor indicators, just as you would monitor any system, and to just be there for your friends. Important stuff. Be alert.
2016-03-07 11.54.47.jpg
Man’s best friend
What a heartwarming moment to watch Simon Wheatcroft, the blind ultra marathon runner, on stage with his guide dog slumbering at his feet. Being a mere marathoner myself I could not constrain my amazement over this man’s achievements, let alone he’s blind on top of that. Technically speaking, the solution that will enable him to run 126 km unassisted through the Namibian desert, is rather straight forward—a gps based beeper thingie that simply beeps when he strays off track. No drones, no real time room mapping, nothing fancy at all. I will start hallucinate anyway so I want it to be simple, he explained. Stupid simple solutions are the best.

Sun, Wind, and Rain

Sun, Wind, and Rain

I lost my mind for a moment and went shopping online: a wind sensor (Oregon Scientific WGR800) , an UV sensor (Oregon Scientific UVN800), and a rain gauge (Oregon Scientific PCR800). The package received was much bigger than expected, especially the wind sensor is quite a sturdy piece. With a little improvisation it all sits on the roof now. At first I got no readings on Telldus Live which made me a bit nervous. It turned out that all it took was to flash the Telldus Net unit with the latest firmware, then the new sensors appeared. Phew. Next worry was if telldus2graphite would manage the new sensors. I had only tested with temperature and humidity sensors before. But it seems to work. Here’s the updated Graphene dashboard:

Screenshot from 2015-03-24 22:48:02

The Quantified Home

Featured image

Some weeks ago I was fiddling with Docker and found a premade Graphite image in the Docker repo. Anyone that has tried to install Graphite knows it is not a very straight forward process, so I got excited and the image turned out to just work. Now I wanted to push some data into my Graphite instance. So I started with Node and looked for a way to fetch data from my sensors on Telldus Live. A few nights later I had started two new repos on github: telldus2graphite and telldus-live-promise. Sensor data from Telldus Live can be fetched by calling the telldus2graphite Node script from a cron job:

* * * * * node <path to telldus2graphite>/node_modules/.bin/telldus2graphite

I thought it could be useful to extract the Telldus Live part from the telldus2graphite project to its own project. I had started out using the telldus-live npm module in the beginning, but I wanted to roll my own based on promises, hence the name telldus-live-promise. Instead of using callbacks the syntax goes like this:


Our electricity meter is hooked up to an ELIQ energy monitor. Although ELIQ has a rather attractive and powerful visual interface, I wanted to fetch energy data into Graphite as well. So I made two more repos: eliq2graphite and eliq-promise.

Fetching the last 24 hours of energy data every hour with eliq2graphite would look like this:

0 * * * * node <path to eliq2graphite>/node_modules/.bin/eliq2graphite --age 24 --resolution hour

eliq2graphite is using eliq-promise to access the ELIQ API, and the syntax goes like this:

eliq.getFromTo (<startdate>, <enddate>, '6min' | 'hour' | 'day').then(console.log).catch(console.log);
{ startdate: '2015-03-02T20:00:00+00:00', 
  enddate: '2015-03-02T23:00:00+00:00',
  intervaltype: '6min',
    [ { avgpower: 2790,
    energy: 279,
    temp_out: null,
    time_start: '2015-03-02T20:00:00',
    time_end: '2015-03-02T20:06:00' },

With both sensor values and energy consumption in Graphite I wanted to create a more visual appealing dashboard than the built in ones that Graphite provides. I made a fork of Graphene and created my dashboard there. Creating dashboards in Graphene is very straight forward and I decided to serve it as simple as possible using python SimpleHTTPServer running as a screen process:


cd graphene

lineCount=`screen -r graphene | grep &quot;There is no screen to be resumed matching graphene.&quot; | wc -l`

if [ $lineCount -eq 1 ] ; then
    echo linecount: $lineCount. Starting in a deteched screen named graphener. Use screen -r graphene to view.
    screen -dmS graphene python -m SimpleHTTPServer 8888
    echo lineCount: $lineCount. graphene is already running. Use screen -r graphene to view. Running now.
    screen -r graphene

I’m using an old netbook to run Graphite, Graphene, and the cron jobs. By the way, old netbooks are great for home servers – they come with UPS and monitors, take up minimum space and are quiet and energy efficient. The dashboard is displayed on an v.1 ipad in the kitchen (it can still do that job). Well, that’s it.

Raspberry Pi Backup


So, now the “piwall” has been up and running for a while and I’ve done some changes to the config and the system since the last blog post. But what if the Pi decides to eat the SD card or I get hacked or… Oh, I need a backup.

There’s a number of backup solutions to choose among, and I decided to go with an easy one: to clone the whole system – as is – which resides on a 16GB SD card. I tried different ways of doing the cloning and ended up using the command line tool “dd” to accomplish the task.

On a mac, you can use the diskutil program to list the mounted disks:

$ diskutil list
0: FDisk_partition_scheme *15.9 GB disk3
1: DOS_FAT_32 NAMNLOS 15.9 GB disk3s1

In order for dd to work, unmount any partitions on the SD card:

diskutil unmountDisk /dev/diskN

“dd” will perform a block-by-block copy the contents of the SD card into a file. This command will take some time (on my 16GB RAM i7 Mac Book Pro it takes a couple of hours) and there will be no feedback during the process. Adding the “bs” argument may speed up the process.

sudo dd if=/dev/diskN of=/path/to/backup.img bs=1m

Copying the img-file onto a new SD card is a matter of simply reversing the arguments. Just remember to unmount partions as described above on the destination card.

sudo dd of=/dev/diskN if=/path/to/backup.img bs=1m

Update: Using /dev/rdisk instead of /dev/disk may be faster on the mac (source).

Raspberry Pi vlan Routing


(Update: this is probably not a 100% safe setup and the PI’s 10 Mb/s network interface limitation makes this solution somewhat limited. But it was a fun exercise ;-))

I wanted to expand my home network beyond Apple Airport’s three LAN ports, into three separate networks: a Internet-of-Things network, where my TellstickNet could live; a demilitarized zone where I could put a web server without worrying too much; and a local area network where my files would be safe. To get this I thought I needed a firewall with three network interfaces plus switches for the three networks.

I happened to mention my plans to my colleague and network expert Tobias Genberg. He said something like this:

“You have a Raspberry Pi, right?”


It turned out my plan was needlessly complicated and expensive. What I could do instead was to get a “smart switch”, a switch capable of running virtual networks, and then use my Raspberry Pi as a firewall slash router. I decided to proceed with this solution.

I knew this was going to be a challenge, because I had to learn a lot – how to configure the switch; how to setup networking on the Pi with virtual interfaces, vlans, dhcp, and dns; learn to understand how iptables works. And of course, I did not know what I did not know when I started. I put a lot of hours into this project, and while my friends and family suffered, I learned a few things about networking and Linux.

Switch configuration

There are a lot of switches out there. I fell for the Netgear GS108T v2 – a capable and yet affordable piece of switch. It is made out of metal and is quite heavy for its small size. A solid impression.

A Virtual LAN (VLAN) is a group of devices on one or more LANs that are configured so that they can communicate as if they were attached to the same wire, when in fact they are located on a number of different LAN segments. Because VLANs are based on logical instead of physical connections.

On this switch, vlan numbers can be any number between 1-4093, except vlan 1-3 which are reserved. I came up with a naming scheme that includes the port numbers, so WAN on port 1 becomes 11 and LAN on port 2 and 3 becomes 23. Easy to remember.

  • 11 WAN – the Internet connection
  • 23 LAN
  • 45 IOT – Internet of Things
  • 66 DMZ

WAN cable goes into port 1 and Pi router connects to port 8.

A vlans on the GS108T have untagged and tagged ports. Devices connected to an untagged ports will not notice that they are indeed on a vlan and will have the same experience as when connecting to a “normal” network. The tagged port however is where the vlan magic happens. The tagged port 8 will enable the router to distinguish between several logical networks on the same physical port.

vlan membership

The table below illustrates how ports on each vlan are configured.

1 2 3 4 5 6 7 8
vlan 11 U T
vlan 23 U U T
vlan 45 U U T
vlan 66 U T

Each port are assigned a default vlan.

pvid configuration

Ingress Filtering enabled on all ports except router port 8 (picture is not showing this). To my understanding, when enabled, ethernet frames from other vlans are discarded, which seems like a good thing. When disabled, all frames are forwarded, essential for the vlan routing to work.

Router Prerequisites

I decided to run the Pi on Raspian using NOOBS to install it. A small but very important detail: ip forwarding must be enabled for all this to work (I learned this the hard way)!

sudo echo 1 &gt;&gt; /proc/sys/net/ipv4/ip_forward

Uncomment the line below in /etc/sysctl.conf to enable packet forwarding for IPv4.



Raspian does not come with vlan out of the box, so the vlan package has to be installed:

apt-get install vlan

The 8021q kernel module has to be loaded

echo 8021q &gt;&gt; /etc/modules

If everything works …

lsmod | grep 8021q

… should output something like this:

8021q                  18046  0 
garp                    6335  1 8021q


We need to install a dhcp server, I choose to run isc-dhcp-server.

apt-get install isc-dhcp-server

There’s a few tools that are very useful when debugging network issues.

apt-get install tcpdump


Adding vlans to the router is done with the vconfig command. There are different naming conventions for vlans. I choose to use the interface.vlan-number convention, e.g eth0.11, which I think is the default one. But to be sure, just run:

vconfig set_name_type DEV_PLUS_VID_NO_PAD

Adding the vlans:

vconfig add eth0 11
vconfig add eth0 23
vconfig add eth0 45
vconfig add eth0 66

Interface Definitions

auto lo

iface lo inet loopback

# eth0 is not used
auto eth0
iface eth0 inet static

auto eth0.11
iface eth0.11 inet dhcp
	vlan_raw_device eth0

auto eth0.23
iface eth0.23 inet static
	mtu 1500
	vlan_raw_device eth0

auto eth0.45
iface eth0.45 inet static
	mtu 1500
	vlan_raw_device eth0

auto eth0.66
iface eth0.66 inet static
	mtu 1500
	vlan_raw_device eth0

# Extra
auto eth0.77
iface eth0.77 inet static
	mtu 1500
	vlan_raw_device eth0

iface default inet dhcp

DHCP Configuration

ddns-update-style none;
ddns-domainname &quot;home.local&quot;;
ignore client-updates;

# option definitions common to all supported networks...
option domain-name &quot;home.local&quot;;
option domain-name-servers,,;

default-lease-time 600;
max-lease-time 7200;

# If this DHCP server is the official DHCP server for the local
# network, the authoritative directive should be uncommented.

# Use this to send dhcp log messages to a different log file (you also
# have to hack syslog.conf to complete the redirection).
log-facility local7;

subnet netmask {
  option routers;
  option broadcast-address;
  option domain-name &quot;lan.home.local&quot;;

subnet netmask {
  option routers;
  option broadcast-address;
  option domain-name &quot;local&quot;;
  option domain-name-servers,;

ddns-update-style none;
ddns-domainname &quot;home.local&quot;;
ignore client-updates;

# option definitions common to all supported networks...
option domain-name &quot;home.local&quot;;
option domain-name-servers,,;

default-lease-time 600;
max-lease-time 7200;

# If this DHCP server is the official DHCP server for the local
# network, the authoritative directive should be uncommented.

# Use this to send dhcp log messages to a different log file (you also
# have to hack syslog.conf to complete the redirection).
log-facility local7;

subnet netmask {
  option routers;
  option broadcast-address;
  option domain-name &quot;lan.home.local&quot;;

subnet netmask {
  option routers;
  option broadcast-address;
  option domain-name &quot;local&quot;;
  option domain-name-servers,;

subnet netmask {
  option routers;
  option broadcast-address;
  option domain-name &quot;dmz.home.local&quot;;


Iptables is a tool for manipulating, filtering, redirecting, and blocking network traffic rules on Linux, which forms the foundation of the firewall functionality in this setup.

I put all the iptables rules in a script to make it all more manageable.





# Delete existing rules
iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X

## General rules
# -P option describes the default policy for these chains
# Filters packets destined to the firewall. 
iptables -P INPUT DROP

# Filters packets to servers accessible by another NIC on the firewall. 
iptables -P FORWARD DROP

# Filters packets originating from the firewall 

# Accept icmp but limit them to 2/s on the outside
iptables -A INPUT -p icmp --icmp-type echo-request -m limit --limit 2/s -i $WAN -j ACCEPT
# Set no limits on other interfaces
iptables -A OUTPUT -p icmp --icmp-type echo-request ! -i $WAN  -j ACCEPT
iptables -A INPUT  -p icmp --icmp-type echo-reply ! -i $WAN  -j ACCEPT

# Defense for SYN flood attacks by limiting the acceptance of TCP segments 
# with the SYN bit set to no more than five per second
iptables -A INPUT -p tcp --syn -m limit --limit 5/s -i $WAN -j ACCEPT

# Allow loopback access
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
iptables -A INPUT -s -j ACCEPT

## ssh
# Allow ssh on all interfaces - a good password, fail2ban, 
# and two factor authentication will hopefpull keep us safe
iptables -A INPUT -p tcp -m multiport --dports 22 -j ACCEPT

## WAN
iptables -A POSTROUTING -t nat -o $WAN -s $NET_RANGE ! -d $NET_RANGE  -j MASQUERADE
# Drop Private Network Address On Public Interface
iptables -A INPUT -i $WAN -s $NET_RANGE -j DROP
# Prior to masquerading, the packets are routed via the filter
# table's FORWARD chain.
# Allowed outbound: New, established and related connections
# Allowed inbound : Established and related connections
iptables -A FORWARD -t filter -o $WAN -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -t filter -i $WAN -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT

## LAN
iptables -A INPUT -s $LAN_NET -i $LAN -j ACCEPT
# Block input from other networks
iptables -A INPUT ! -s $LAN_NET -i $LAN -j REJECT
iptables -A FORWARD -d $LAN_NET -j ACCEPT
iptables -A OUTPUT -s $LAN_NET -i $LAN -j ACCEPT

## IOT
# Only accept access from LAN
iptables -A INPUT -s $LAN_NET -d $IOT_NET -j ACCEPT
# Reject others
iptables -A INPUT -s $LAN_NET ! -d $IOT_NET -j REJECT
iptables -A FORWARD -d $IOT_NET -j ACCEPT
iptables -A OUTPUT -s $IOT_NET -i $IOT -j ACCEPT

## DMZ
iptables -A INPUT -s $DMZ_NET -j ACCEPT
# Block DMZ from doing outward connections
iptables -A OUTPUT -s $DMZ_NET -j REJECT


… and then I made another script which wipes all rules. Believe me, it can be very useful. Iptables is not too hard to understand, but when you don’t, you often find yourself locked out.


iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X
iptables -P INPUT ACCEPT

iptables -L


I’ve been running this setup for a couple of weeks now without (knowingly) being hacked. If you see places for improvements, I would like to hear about them.

Thank you Tobias for helping me out 😉

NICAR13 Day 4, Sunday 3/3

This conference isn’t over yet! Just one more session: Mapping best practices. Up on the panel was John Keefe, Matt Stiles, and Eric Gundersen. The very first question – is a map necessary? Don’t do maps just for the sake of doing maps. And beware of maps that are basically population maps.

Some people are color blind, and there are tools to help you select colors that work for them, like Colorbrewer2 and Colororacle.

In order to tell a story, there must be a balance between exploratory control and controlled narrative. Finding this balance is key. Too much exploratory control and people get lost. Too much controlled narrative and the story ends up as an ordinary article (I guess).

When serving maps: pregenerate data – high and low resolution – in order to serve the material efficently for both browser and server. Polygons can be simplified at higher level. Use a tileserver, like TileMill for Mapbox

Someone in the audience raised the question what the panel thought about making data public. They seemed to agree that making data public makes sense in most cases. For example, by making data public, people can help correct it.

It seems there’s no single tool that can solve all mapping problems, instead there are a rather large box of tools that can help solve a lots of different problems.

The slides.

NICAR13 Day 3, Saturday 2/3

bildIt’s the last day of the NICAR13 conference. Today I’ve been watching Matt Waite tell the story about the Pullitzer prize winning site Politifact. Matt was very keen on structure. Because everything has structure, especially stories. If you can find the structure and think of it on a higher level, you can build systems (like Politifact). Another aspect of building something that overlaps journalism and IT is cultural resistance. Freaked out reporters and reluctant developers, not to mention clueless management. “Build shit, don’t talk shit” – i.e build a prototype to have conversations around. “Your mission might be a very small defined thing”, says Matt. If you can describe your thing with a single short declarative sentence, then you have a chance – you can pitch things. Guard your ONE THING zealously. Having a structure makes it possible to say NO to things. The core question: What is the atomic unit of this?

Another sesson I went to was on how to develop reusable visualization components using D3 and Backbone, with Alastair Dant who works with development at the Guardian. The Javascript library D3 really stands out as everyone’s favorite at this conference. Alastair is a fun guy, and it was a joy listening to him. I could feel part of the audience zoning out (this isn’t primarily a developer conference after all) during his walkthrough of the code. I truly enjoyed it though. The example code can be found on github. Also check out R2D3 if you are required to support IE 7 and 8.

The “Swedish Contingent” at the conference had booked a lunch session. To be honest, this was nothing I was looking forward to in particular. I think the organizers of this conference put in a 2 hour lunch break for a reason. But I was very happy to see Matt Waite again. This time flying around with a microscopic quadrotor drone. And I learned that there’s a drone journalism lab somewhere at the University of Nebraska. How satisfying for the nerd in me to hear Matt speak about hardware hacking, Arduino programming, drones and mesh networks. Where is journalism going? 😉

People have shown amazing stuff here, and we can all do amazing stuff back home – by crossbreeding ideas and competences. And a little bit of coding 😉

NICAR13 Day 2, Friday 1/3

Bringing Local Geodata to Journalism – Ate Poorthuis, Matt Zook
Ever since December 2011 has consumed and indexed every geotagged tweet produced (about 3% of all tweets are geotagged) in the world using elasticsearch and Twitter’s streaming API. Unfortunately Floatingsheep is not openly available to the public. Ate Poorthuis and Matt Zook from University of Kentucky demoed some of Floatingsheep’s awesome capabilities, like locating the epicenter of an earthquake(!).

Data visualization on a shoestring – Sharon Machlis, Kevin Hirten
What can you do if you are on a small budget, or even, no budget at all?
Sharon and Kevin pelleted the audience with free (as in free beer) tools:

Sharon’s chart:

Smarter interactive Web projects with Google Spreadsheets and Tabletop.js – Tasneem Raja
Tasneem Raja at Mother Jones sees everything that reporters produce as data – “Everything is data, and since everything is data it can have structure”. At Mother Jones they have build their own cms on Drupal and Google spreadsheets. Reporters feed data into spreadsheets and information is extracted into the browser using Tabletop.js. Tasneem pointed out a few caveats. Google limits access and threasholds aren’t clear. And the solution is depending on Google not changing its API. The solution manages to run on a single private Google account though.

D3? R? Tableau? What’s right for you? – Amanda Cox, Robert Kosara
Having no particular experience with any of the tools, my impression is that D3, R, and Tableau each solves different problems. What caught my interest the most was the D3 Javascript library (here’s one example: the Waterman Butterfly Map). Because D3 uses SVG it will not work (out of the box) with Internet Explorer below version 9. Another Javascript library mentioned was numeric.js that can work with matrices and vectors.

How to serve mad traffic – Jeremy Bowers, Jacqui Maher
This session was hilarious! With a great sense of humor, Jeremy explained the three virtues of a great sysadmin: lazy, impatient, and proud. At they use Ruby on Rails and nginx on Amazon S3. By putting their systems in the cloud they can tailor it to the traffic in a flexible way. But despite your best efforts your loadbalancer might melt. Things like Ajax polling might cause unexpected load.

A few pointers on serving mad traffic:

  • You need to know the path that each request travels.
  • And if each request requires an application server, it won’t scale.
  • There are only two hard things in computer science: cache invalidation and naming things – Phil Karlton
  • nginx! 100k+ req/sec
  • no db + dynamic == easy to scale
  • Scale up: more servers
  • Use consitend libs for live polling (js).
  • Sanity check data entry/ delivery points.
  • Plan to degrade gracefully at risky areas.
  • review, review, review
  • Don’t bypass caches.
  • Don’t request mbs of json every 30s!
  • Turn off keep-alive.
  • Turn off gzip.

I asked Jacqui and Jeremy how maps are served by Nytimes and apparently they use a tool called Tilemill. Gotta check that up…

Lightning talks
5 minutes enlightening lightning talks for about an hour. Fun and intelligent. I’m truly awed and impressed by the performances of the people on stage. As a data nerd and hardware hacker I found Matt Waite’s Arduino and Nintendo Wii hack particularly inspiring. Using an accelerometer (harvested from a Wii remote control) connected to a programmable microcontroller, they built a data gathering device which they checked in with a bag at an airport to track TSA’s (mis)handling. I hope this example inspires more people to go hacking with hardware and programming – because it is true: with programming you can control robots!

NICAR13 Day 1, Thursday 28/2

I’m at the Computer Aided Reporting conference in Louisville Kentucky. Here’s my summary of day one:

Information design and crossing the digital divide – Christopher Canipe, Helene Sears
What inspires me is hearing stories from people who took on the challenge to do something new – like Christopher Canipe who moved from paper to web and had to learn about programming and Javascript. Helene Sears told the story about how graphical work is done at BBC. What I liked the most was the James Bond parallax scrolling infographics.

Prediction is very difficult, especially about the future – Andy Cox
Part of this session (like weather predictions and forecasts) went completely over my head, but I got inspired to try out the d3 Javascript library.

Down and dirty with the DocumentCloud API – Ted Han
DocumentCloud is a service that turns documents into searchable and analyzable data. It seems pretty useful with its API and scripting abilities. I wonder if there’s limitations with foreign languages like Swedish?

Dig deeper with social tools – Mandy Jenkins, Doug Haddix
Mandy and Doug went through an amazing array of useful social web tools. Go check them out for yourself:,,,,,,,,,,

Practical machine learning: Tips, tricks and real-world examples for using machine learning in the newsroom – Jeff Larson, Chase Davis
As a data nerd, this was the most exciting session of the day. Jeff and Chase showed different techniques to create decision trees and other machine learning stuff, and pointed out the tool Weka to use for exploring such. Jeff and Chase have been kind enough to put their code on github:

Visualizing networks and connections – Irene Liu, Kevin Connor may perhaps be described as a crowdsourced Facebook about the power elite, where the dots are connected between the ultra rich and those in power, and how they connect to organizations. Another similar website is Irene Liu made the perhaps boldest presentation so far – went live just half an hour before her presentation(!). A very impressive and thorough html5 app on China’s power elite. A thing I learned about China is that although they have only one political party, it is highly fractioned. Very interesting. My guess is that the site will probably be censored in China.

Goodnight and see you tomorrow!