This is the second post in Nerd Box, an occasional feature in the Walkley Magazine about data journalism.
By Simon Elvery
Australian Broadcasting Corporation
Australian media outlets have been covering the debate over metadata and data retention for several years, but it has always remained a difficult concept to understand or to explain. Though central to people’s lives, it is often discussed in abstract and confusingly technical ways.
As the political debate around telecommunications data-retention was really hotting up, ABC reporter Will Ockenden requested his own mobile phone metadata from Telstra. We quickly discovered just how easily this abstract technical data could be turned into an invasive and deeply personal profile. It earned a strong reaction from the audience and rolled out across all ABC platforms, including nationally on the 7pm television news bulletins, ABC News Breakfast, The World Today and local radio stations across the country.
The raw data
Will received his metadata from Telstra around April 2015 and, looking to do something interesting with it, came to ABC Digital’s Interactive Digital Storytelling (IDS) team to talk about how we might use interactive storytelling techniques to really bring a metadata story to life.
The first step for Will was to ensure that the data was appropriately anonymised in order to minimise (and ideally avoid) any privacy implications for people in Will’s contact list.
A plan was hatched where the IDS team would spend some time independently analysing Will’s data to see what we could find out about his life and initial responsibility for that fell to Colin Gourlay, one of the developers on the team. As well as providing privacy for Will’s contacts, the anonymisation performed gave the IDS team an opportunity to test some of our assumptions about the revealing nature of personal telecommunications data—could we figure out who Will’s contacts were without any prior knowledge? Could we find any journalistic sources?
Colin got to work, first figuring out the shape of the data then cleaning the data of errors and omissions. Here’s a snapshot of how it looked before that process started.
You can already see there are quite a few holes in the data which need to be considered. Here’s an example of the kind of script Colin wrote to clean up the data.
As there always is, there were quite a few data issues to deal with here. They range from the mundane and expected (missing values) to the unexpected, but very important (a group of rows that had columns shifted down one row).
We verified any details we could and, as the interactives came together, continued to spot check the visual output to ensure consistency.
One of the many little ‘gotchas’ that often crop up when working with data is dates and times recorded without reference to a timezone. In this case we were lucky that Will had only visited areas of the country within the eastern time zone, but we had to verify whether the period of time when daylight savings was in force had been accurately accounted for in the data analysis.
After assessing and cleaning the data Colin created a few test interactive components to help the whole team understand the data. These were rough around the edges and designed as exploratory tools, but many of them also made into the final piece after some cleanup and design love.
Finding the story
This was a slow-burn for us and proved challenging from a storytelling perspective. Even though we knew other media outlets were likely to be working on on similar stories, we deliberately took our time because we really wanted to get it right. From start to finish this story was several months in the making.
The breadth of the topic and the significant implications these changes to law and technology could have for society meant that there were so many different approaches we could have taken for this story.
After having a play with the test interactives the whole team got together to come up with more detailed coverage plan. How could we best present this complex issue to the audience in an engaging and informative way?
As we discussed and examined potential angles, each one we looked at seemed to leave out some important aspect. We spent a lot of time agonising over framing and trying to keep the scope in check, while not leaving out too many important parts of the story. Indeed there are many ideas we ultimately couldn’t fit and as a result there are several more potential follow-up stories we’d like to have done.
We all agreed we wanted to show the audience how trivial it is to paint a detailed portrait of a person’s life from this data — “the envelope” in government parlance — which can seem rather innocuous. While all telcos and ISPs are now be required, by law, to keep detailed records on every one of their customers for two years, the government has been at pains to downplay the privacy implications.
We felt that one way to demonstrate to the audience how invasive this could be was to give them the opportunity to play investigator themselves. So we began working on a set of interactive elements which would help the audience interrogate the data and form a picture of Will’s life. We would also provide the raw data for anyone who wanted to take the analysis further.
We created several map-based interactive elements people could use to explore where, when and how Will used his phone for communicating, including anonymised details for his top 10 contacts.
We left a fair few ideas on the cutting room floor for a variety of reasons. Some might have breached Will’s privacy or that of his contacts — as it turned out, we pushed that about as far as anyone was comfortable with. One ditched idea in this category was to have a journalist attempt to physically locate and track Will based just on what we knew from the metadata.
Another common reason for moving on from an idea was if it added too much to the complexity of an already complex story. In a dataset like this there are many rabbit holes to explore, like Will’s network of contacts.
During production of the story we had kept Will well informed about our plans to ensure he was comfortable with everything we were doing, since asking the audience to explore his life in such detail was a bit risky.
The story was launched on a Sunday night, typically one of the lower traffic periods for the ABC News website, but it was immediately popular, shooting to the number-one spot. It also sustained the audience’s interest, remaining one of the top trafficked stories well into Tuesday. As well as finding a large audience, people engaged with the story strongly, spending much more time interacting and reading than usual.
We received an impressive response from audience members doing their own analysis of Will’s life — with mixed success at gleaning accurate insights. More than 300 people submitted their own analyses. Many had downloaded the full dataset and spent considerable time analysing it using their own tools and techniques.
Several academic organisations and data analysts also pledged to do their own analysis of the data.
In the end, we’ve been really happy with the outcome. We felt it told the story of what metadata is and shone light on some of the implications in an innovative, engaging and digital-first way.
With the data-retention regime now law in Australia, this story gives Australians a lot to think about.
It’s a pity it had to come at the expense of Will’s privacy. But ironically it’s now only people with strong technical knowledge, like Will, who are in a position to protect their own privacy.
This article first appeared in a slightly different form in ABC’s Back Story blog, which lays out how the sausage is made there. Simon Elvery is a developer with ABC News’ Interactive Digital Storytelling Unit, based in Brisbane. He’s @drzax on Twitter.