Tomorrow is the deadline to apply for the 2015 Knight-Mozilla OpenNews fellowships. Almost exactly two years ago, I was sitting in my friend’s living room in Queens, hemming and hawing about whether I should bother applying. I sent my application in a few hours before the deadline, which was nowhere near the record for cutting it close. Brian Abelson, my brilliant co-fellow who was placed at the New York Times, famously got his application in with eleven seconds to spare. If you’re interested in the intersection of code, journalism, and community, you should apply. It’s not too late.
I’ve written elsewhere about my own fellowship (See: 1, 2, 3, 4). Many others have written eloquently on what makes developing in the newsroom so great. But let me summarize some of the things that make the fellowship such a once-in-a-lifetime experience:
You’ll have a cohort of six incredible co-fellows to learn from and collaborate with. The magic of the OpenNews program is that the team takes care to select a diverse group of newsroom partners and a diverse group of fellows, so everyone brings something special to the table. You’ll be one of seven people with totally different backgrounds and expertise and that combination makes for beautiful, unexpected things. In my class we had Manuel, who, before he became a fellow, was launching fucking satellites into fucking outer space. We had Friedrich, the Harry Houdini of web scraping, who seems to know everything about open data and opening data there is to know. We had Stijn, who in addition to being obsessed with news analytics, also happens to play a mean gypsy jazz guitar. The list goes on.
You’ll get to travel the world meeting other amazing people. When you’re a fellow, flying to Buenos Aires to hack on projects with a thousand rogue technologists is in your job description. Hard to believe, right? OpenNews gives you the resources, financial and otherwise, to explore the entire universe of news nerds and civic technologists. By the end of the year you’ll have discovered so many intriguing organizations and people your head will be spinning.
You’ll have an unreasonable amount of freedom to pursue things you’re interested in. The OpenNews fellowship is a lot less structured than other programs, by design. You’re let loose in a newsroom to discover the things that you’re passionate about and the projects where your time is best spent. Some of us came in with projects in mind at the start, some didn’t, but all of us found our fellowships taking us in great unexpected directions. The OpenNews team, Mozilla, et al. are there to support you in the ways you need and then get the hell out of the way and trust your instincts.
People will suddenly take you strangely seriously. Before I became a fellow, I was a complete outsider to this world. I had never worked in a newsroom. I didn’t know what NICAR was. There was no reason for anyone to give me the time of day. And yet they did. I was overwhelmed with how warm and welcoming everyone was throughout my fellowship year and how much people cared about what I had to say. The OpenNews imprimatur carries a tremendous amount of weight in the news nerd community.
And by the way, they’ll pay you fairly. A lot of fellowship stipends are so meager they turn you back into a starving student for the duration of your fellowship. OpenNews pays better than most, and is generous with things like relocation costs, housing supplements, equipment, and travel costs (details).
There are two hesitations I often hear from people who are considering applying for the fellowship.
"I don’t think I really fit the profile, [I’m not a developer/I’ve never worked in news/etc.]"
This was me, two years ago. I almost didn’t apply because of this worry. But here’s the thing: there is no profile. Fellows come from all walks of life. Some of them have worked in news before, some haven’t. Some are master coders, some less so. Most of them didn’t study computer science. The roster of ex-fellows includes statisticians, artists, chemists, activists, engineers, academics, and even a medical doctor. The fellowship is less about what you’ve done and more about what you’ll be able to do given 10 months, a bunch of smart collaborators, and a lot of freedom. The one thing the fellows all have in common is that we all thought we didn’t fit the profile when we applied. Don’t let self-consciousness about your resume stop you from applying.
"But what comes AFTER the fellowship?"
This was me, one year ago. Remember what I said above about how by the end of the year, you’ll have encountered so many intriguing new things your head will be spinning? This is the real problem with the fellowship. It lights up all kinds of synapses you didn’t know you had and introduces you to jobs and technologies and organizations you didn’t know existed. By the end of the year you won’t be worried about having something to do next. Your problem will be that you’ll have a thousand things you want to do next. And this is pretty much the textbook definition of a “good problem to have.” Like the question of “what makes a good fellow?” there’s no one particular answer to what people do afterwards. Some of us have stayed in newsrooms, others have gone to tech companies or academia or something else entirely. Mark Boas is living in a house on a hill in Tuscany and I imagine that his life basically looks like A Walk in the Clouds. If you’re chosen, I promise that life-after-fellowship will be the least of your worries.
A year ago I arrived in London for the Mozilla Festival, still a little unsure of what I had gotten myself into. I didn’t know a single person there. Everyone was scary smart and talking about things I had never heard of. Some of them sported Lovecraftian beards. I mostly just tried to keep my mouth shut and soak it all in. On the long flight back to San Francisco, my head was spinning.
Fast forward to last month, when MozFest 2013 came back to London. Those same people I was so hopeless intimidated by a year earlier? Now they all felt like old friends. Instead of sitting quietly in the corner I was running all over the building, leading a workshop, giving tutorials, presenting my work, writing code, and even speaking at the plenary session.
In the past year, I’ve had a chance to dive into all of these things and lots more. In some cases, I’ve even developed something approaching expertise and been asked to teach others. That is insane, and also awesome. This was the year I went from a Fisher-Price toolset to a real one, from knowing a few tricks to having genuine confidence that I can crack any coding challenge that comes up.
Getting things published on BBC News Online.
The first thing I got published for the BBC was just a measly line chart. It was about 6pm on a Friday, and a reporter and I were the only two people left in the office. Someone from the business desk needed a chart for their story, and fast. I knocked out a quick stock price chart conforming to BBC style and it went live shortly thereafter:
I have made hundreds, if not thousands, of line charts in my life. There’s nothing special about this one at all. But oh, what a feeling to see that up on the BBC. I was ready to dance in the streets.
Since then, I’ve gotten some more substantial work published on things like the elections in Pakistan and Margaret Thatcher’s funeral. Yesterday I was working on visualizing the history of the World Cup. It’s never a dull day here.
Of course, by far my most popular project was The Secret Life of Cats. Given how often complete strangers ask me about it, I’m convinced that will be my epitaph. Few people know this, but the original idea was the result of a decade-long effort by BBC R & D scientists to achieve Absolute Internet, a state that was previously only theoretical and thought to be unreachable within the laws of thermodynamics:
Open-sourcing some useful things.
When I applied for the fellowship last year, they asked for a link to my GitHub account. This was a problem. I had a username, but I didn’t have a single shred of code posted. I quickly created one little Potemkin repo in an effort to trick the OpenNews powers-that-be that I was hip to this whole GitHub thing. I don’t think it worked, but I guess it didn’t hurt.
Since then I’ve gotten better and better about getting over my code shame and putting things online. I’ve also gotten better at packaging to distribute, sanding down the rough edges of a library and thoroughly documenting it (I might be the only person who actually enjoys writing documentation). I’ve got a long way to go in this regard, but I’m headed in the right direction.
Discovering a love of teaching.
If you had asked me a year ago I would have said I didn’t have much to teach anybody else or the right temperament for that role. But then a funny thing happened. Teaching others, and figuring out the obstacles to learning new skills in the newsroom, became a central theme of my fellowship.
It turns out I really enjoy working with people as they learn, trying to find that lightbulb moment when some piece of the web suddenly goes from mystical to mechanical and the moving parts click into place. I’ve mostly learned this stuff on my own, making progress in fits and starts and getting frustrated along the way, so I think my experience is a lot closer to the average code-curious journalist than someone who got a real coding education. I know all too well what it feels like to want to tear your hair out when you copy and paste an example from a tutorial and it refuses to work, or when the goddamn text box just won’t go where you want no matter how many numbers you change, or when an experienced coder explains how to “just” do something that winds up taking you six hours.
As a coder, you will always get in over your head and get stuck. But it’s easy to forget how thick the fog is when you’re just starting out. You don’t just get stuck, you have no way to formulate a plan to get unstuck. When something goes wrong, you are rudderless, without the nose for debugging and breaking down the problem into manageable bits that you develop with experience. It can be really dispiriting. I hope I can maintain some perspective on that as I move farther away from the beginner end of the spectrum.
Saying yes to too much.
This year has been a constant flood of surprising opportunities to learn and do new things. There was always another event, another project, another idea, another chance to say “yes, and…” I chased every rabbit wherever it led. I pushed myself out of my comfort zone whenever I could. That crazy, frenetic blend made this year what it was, and I wouldn’t trade that for anything.
Once upon a time, I wrote things for a living, and I could sit down and knock out ten pages before lunch. But writing is a muscle. You use it or lose it, and as my days have become less about words and more about code, that muscle has totally atrophied. Now a feedback loop has kicked in, and I fear the blank page much more than the blank command prompt. I really should have forced myself to write more about my work this year.
Not collaborating with my fellow fellows.
This is probably my most profound failure this year. My fellow fellows are a remarkable bunch. Every one of them brings unique talents to the table, and my favorite parts of this year have been spending time with them in different corners of the globe. While I’ve collaborated with them casually quite a bit, tapping someone on the shoulder for advice or expertise, I really wish we had found a chance to work together closely on something more ambitious. I’m not sure what exactly that something would be, which is probably why it didn’t happen, but I consider that a big missed opportunity.
Holding on to bad habits.
This year I had hoped to move away from developing by the seat of my pants and towards more sound practices, things like build automation, test-driven development, and more reusable modules. While I’m a much better developer than I was a year ago, I don’t think I succeeded on this front. I still sometimes end up with code like the messy office where you know exactly where everything is but nobody else would stand a chance. Part of this is because the newsroom is unusually forgiving of code sins; timetables and code lifespans are both short. But I also had the unfair luxury of being a one-man band on lots of projects, either because it was a solo effort or because my contribution could be easily compartmentalized. I wish I had done more of the deep collaboration that forces you to get smarter about structuring your code to serve other masters.
Photography has been an important hobby of mine for a long time, but this year my camera barely made it out of the bag. I was also determined at the start of the year to do some sort of project around the place of photojournalism in the news app world. This is something I care deeply about. It worries me that the medium of stills seems to be an afterthought for most people in this space, like we invented the audio slideshow and called it a day. Even animated gifs seem to get more play than traditional photojournalism. I had some interesting conversations with photo editors about this, but beyond that I failed utterly in my plan to turn it into anything of substance.
Saying yes to too much.
The downside of saying “yes, and…” to every opportunity was clear. There are only so many hours in a day, and I stretched myself too thin. I ended up with an elephant boneyard with dozens of unfinished projects big and small. I really should have prioritized those better, starting fewer and finishing more.
In just over a week, my OpenNews Fellowship at BBC News will end in the traditional fashion, with Dan Sinker pushing me into the Atlantic Ocean on an ice floe. It’s been an incredible ride, one of the best years of my life. I’ve pretty much run out of superlatives for it. Above all else, my fellowship has been a learning experience, a crash course in the frontiers of code in the newsroom. So what have I learned?
You have to run your own race. The sheer volume of inspiring work and great new ideas coming out of the news nerd community presents a challenge. Every day I see literally dozens of new tools, resources, and news apps worth my time. But the time isn’t there, and that used to drive me nuts. I would bookmark things to come back to later and the list would just grow and grow. It will overwhelm you if you let it.
I hear a variation of this problem echoed by journalists and students who are trying to get started learning to code. There is just so much out there. Where do you even start? How do you dip your toe into Class 5 rapids?
In this world, your plate will always be overflowing with opportunities to learn, and that’s exactly how it should be. Any domain worth mastering is impossible to master. News development is changing too fast for any one person to keep up, and it’s a hydra; everything you do ends up opening five new avenues to explore. You have to find a way to let go, to stop trying to take it all in or “keep up.” There will always be more cool stuff out there than you can read, learn, or use, and that’s OK.
Doing something different ≠ Experimenting I often hear people in newsrooms talk about “experimenting” when all they really mean is just “changing things up.” Experiments are designed to test a hypothesis, emphasis on designed. You need to understand what exactly you’re trying to test, and then you need a plan for testing that specific question and assessing the results without getting faked out. It’s the difference between retrospective medical studies, which have so much noise in the data they rarely produce meaningful insights, and double-blind clinical trials, which are the gold standard of medical research for a reason.
When thinking about new ideas in the newsroom, put on your scientist hat. Turn your idle speculation about what will or won’t work into a testable hypothesis. Figure out what counts as success or failure ahead of time. Don’t just gather all the data you can get your hands on and see what you can find out later. You’ll wind up drunk on metrics without any useful conclusions. For a smarter take on this issue, read my colleague Stijn Debrouwere’s piece on cargo cult analytics.
The bubble is a big challenge. One of the hardest parts of doing deep data projects and interactives is maintaining empathy about your audience. You spend dozens of hours with your data and it becomes your best friend. You guys go on long walks together. Maybe you rent a tandem bike if the weather’s nice. You end up cramming every possible angle into your story and adding lots of big, beautiful charts and widgets. Surely everyone will want to investigate the nooks and crannies of this fascinating topic as much as you did.
Cut to next morning, when your readers skim your story for 10 seconds on their tiny phones while they walk into the subway station for their morning commute. Whoops.
If you work in data journalism, you’re probably the sort of person that loves deeply exploring data. Meanwhile, for your readers, your story is just one of dozens they might come across during the brief cracks in their day. It’s easy to forget this. Your journalist/coder peers will ooh and aah over inside baseball sorts of achievements. And it will be hard to kill your babies; you spent weeks on this stuff, and now you want to get it all on the page.
The end result is that we still produce lots of bloated stuff with a good story buried somewhere inside, gasping for oxygen.
My favorite formulation in response to this is what the ProPublica team calls the “near” and “far” view: making sure you give the big picture up front so someone who will only give you 10 seconds gets something out of it, then offering the opportunity to explore and personalize the story in greater depth for someone who will give you a full 10 minutes. Think of it as progressive enhancement for attention: some people will have tiny screens, some people will have cinema displays. You want to serve them all.
And while we’re on the subject: a data dump is not data journalism. Just throwing up a giant dataset online without adding context or conclusions is a capitulation, the equivalent of printing the notes from your steno pad in the morning paper. Once you have the data, your job is just getting started.
Conferenceitis is a serious medical condition. Feeling lethargic? Eating too many finger sandwiches? Tweeting about airports a lot?
You may be suffering from conferenceitis. Talk to your doctor today.
Conferences can be fun, but it’s easy to go overboard. I certainly did this year. After you go to enough events, fielding the same questions over and over, not only are you not getting your other work done, but you aren’t even producing original thoughts anymore. You just end up quoting yourself. You accidentally develop a spiel instead of just having a conversation. That sucks.
There aren’t a lot of shortcuts to real wisdom; it comes one fumbling step in the dark at the time. My favorite events of the year have been the ones that skip the forest and stick to the trees, especially MozFest and NICAR. In both cases the sessions focusing on teaching something useful and applicable, not on grand principles or the Future of All Things. Beyond events like those, I plan to only conference in moderation from now on. So join my colleague Friedrich Lindenberg and me in 2014 for the launch of DeskCon, a new kind of unconference where we all sit at our desks and finally get some work done.
Data journalism offers new and exciting ways to screw up. People tend to presume a certain authority and accuracy of computer-assisted reporting methods, but these methods are only as smart as their human practitioners. In reality, they offer a delightful bouquet of new ways to screw up, many of them subtle enough to avoid detection until they produce maximum embarrassment. Remember that time I left out every country starting with S? Or when half the points on my chart were wrong because of Daylight Savings Time? Or when I mistook a moving car for a housecat in a GPS trace? I sure do.
Data doesn’t just radiate truth and meaning on its own. It’s a volatile raw material, one you have to treat with great caution and care to glean any legitimate insight. The bad news? This takes a lot of hard work. The good news? Hey, maybe you won’t get replaced by a robot after all!
Things have a way of coming full circle. It would be hard to overstate what a radical change this year was for me. I jumped into a totally new industry. I packed up and moved 5000 miles away to a strange island full of fried food and royal corgis. And yet I’m constantly surprised by the ways threads from my past lives keep showing up again. A big focus of this year has been about the value of open source; my first job after college was actually working at a PR firm representing open source companies and organizations, back when GitHub was just a twinkle in SourceForge’s eye. I find tech policy work from my past resurfacing in the newsroom through issues like censorship, online surveillance, open government, and internet standards. I even get to dust off my mothballed political science degree when it’s time to get wonky about election coverage.
When I talk to journalism students or recent grads, I hear a note of panic as they struggle to plot out their future career path and wonder how to connect the dots. This year has been a good reminder that you don’t get to connect the dots ahead of time. They only connect in retrospect, after lots of zig-zagging along the way. If you just seek out interesting work with interesting people and never stop learning, wonderful things will happen.
Community is everything. The beating heart of all of this is the incredible news nerd community, a motley crew of journalists, coders, civic hackers, and all manner of hybrids that somehow manages to be both so tight-knit and yet so welcoming to all comers. It’s amazing to me how all of these people theoretically working for competitors can be so totally on the same team, giving freely of their time to share their work, collaborate across organizations, and help us all get better. I’ve benefitted from the kindness and genius of my peers more times than I can count; I hope I’ve been able to give something back. I’m very proud to be a part of OpenNews, building connective tissue to help grow this community even more. I can’t wait to see what the future holds for it.
Last month I co-led a “Web Developer Literacy” for reporters and editors at the Online News Association conference. I expected a lot of questions about particular technologies, but the discussion wound up focusing much more on process and office politics, touching on tough questions like:
How do you integrate developers into a team of reporters? How do you spec out digital projects when you have no idea what’s feasible? How can developers, designers, and reporters work together effectively in the crucible of a newsroom?
These are far from solved problems, and newsrooms have some particular handicaps. They typically lack the time or money for a strong project management function. Needs are unpredictable (I don’t know of many software companies where a product is conceived in the morning and then launched before lunchtime). Decisionmakers are unlikely to come from technical backgrounds, and they’re still adjusting to the relatively new phenomenon of developers in the newsroom.
Despite those challenges, though, lots of interactive teams seem to be converging on certain successful principles. Here’s the short version of what I said last month:
Clarification: when I say “developer” below I mean a newsroom developer who works on interactives, graphics, data journalism projects, etc. How much this applies to a developer who works on your CMS or your mobile app is a separate question.
Talk to a developer early and often.
One of the worst things you can do is let the editorial horse get way out of the barn and then drop your request on a developer’s desk at the last minute. Supposedly “technical” questions have real implications for design and storytelling, and you need that perspective when the project is taking shape, not after all the important decisions have been made.
Even something as simple as geography matters. If a reporter and developer are working together on a project, they should probably sit next to each other. If that’s not possible, get them on chat or have them pick up the phone often. Email and tickets are great, but asynchronicity is the enemy when you’re working against a deadline.
Ask a lot of questions, especially ones you’re afraid are stupid. Odds are someone else in the room has the same question. When a developer lapses into obnoxious developer-speak, swallow your pride and ask them to translate. Don’t just nod and make a mental note to go Google “S3” later. Having the conversation right then will clue you in, but it will also help your developers understand where you’re coming from.
Developers are journalists, not technicians.
Your news developers may not be writing or calling sources, but they are journalists, and should be treated as such. You need everyone invested in the common cause of the story and the audience. If they aren’t, and they feel like their job is only to worry about the technical details, the thread will get lost along the way and you will end up with a beautifully designed, beautifully coded piece of crap. Your developers will be gatekeepers who spend their time saying no to things instead of contributing ideas and working with you to solve the problems that matter.
Talk up front about what might change.
In a newsroom, information rarely comes as a perfect batch, especially on a breaking story. It comes in bits. It gets revised and replaced. As you prototype things or explore some data, you’ll wind up adjusting your original approach. Things will change. That’s OK. But it can save everyone a lot of time and aggravation if you express that uncertainty before you send a developer down the rabbit hole.
Whether an idea is firm or experimental, whether data is going to change or not (spoiler alert: it’s going to change), whether a project is definitely going live next week or is definitelymaybe probably not going live next week: these will significantly affect how a developer approaches a task under the hood. The best thing you can do is simply be upfront about what you know and don’t know, the possible ways the project might zig and zag. This way your developers won’t paint themselves into a corner, and they’ll free up more time for other work.
Don’t think “possible” vs. “impossible.”
A lot of questions you get as a developer start with “Would it be possible to…” Almost anything can be done given enough time, enough developers, and enough duct tape, but if you just keep throwing changes into the pot one at a time without a sense of opportunity cost, it’s not going to end well. You will almost never get to produce your ideal version of an interactive. Many good ideas will be left on the cutting room floor. The starting point for discussing a new one should be about the timetable and the existing priorities.
Respect a developer’s concerns, but be ready to push back.
But this doesn’t mean you should be a supplicant, going along with whatever a developer says because they’re using a bunch of jargon or you’re afraid to step on their turf. Developers have plenty of biases, and they can easily lose sight of how one technical bugaboo balances against other tradeoffs. Challenge them on things. Ask them to explain their reasoning. That’s how we all get better. If they get prickly about it, they’re doing something wrong, not you.
This is a two-way street.
This isn’t just about reporters and designers working to better understand where developers are coming from. The reverse is equally important. What works for a software company does not always work in the newsroom. Developers should strive to better understand the reporting process, the importance of design, and the unique demands of news. They need to let go of some of their technical dogma and get used to working without a net on most projects. They need to care about your audience, which may consume news differently than they do. Above all, they need to learn to truly work as a team with non-developers, and that comes back to communication again, being able to explain the why of complex choices to non-developers and give competing priorities a fair hearing.
A few months ago I discovered that Wikipedia provides detailed hourly data dumps of how many pageviews each article gets, and the former political science major in me quickly sprang into action. I wanted to look at article traffic for candidates during the run-up to the 2012 election; I figured I would find all sorts of interesting patterns and glean new insight into American politics and information-seeking behavior. It was going to be great. As usual, I was wrong.
Before I could even investigate the data, I had to jump through a few hoops. The hourly dumps include EVERY Wikimedia page in one giant tab-separated list, so you’re talking about terabytes of data in total just to grab a very short list of presidential and senate candidates. It also turns out that, shockingly, some major party Senate candidates from the 2012 election don’t even have Wikipedia articles. To further muck things up, because the end of daylight savings time occurs during the campaign, you have to do some time-shifting to get everything to match.
Once the data wrangling is done, if you plot the hourly pageviews for Romney (in red) and Obama (in blue) as a stacked area chart, it looks like this:
You see certain spikes there that line up with key live events.
OK, so this is mildly interesting. The story here seems to be that people run to their computers to look up the candidates during the debates, on election day, and during the conventions, when something is happening right at that moment on TV. The disparity between the activity during the GOP convention and the Democratic convention makes some sense, since Obama is more of a known quantity. And if you zoom in on the conventions, you see that everyone is looking up Romney during the GOP convention, but it’s about 50/50 for the Democratic convention:
But what if we take the same data and aggregate it by day instead of by hour?
Now the story looks quite different. The conventions and debates are really just blips. All the action is on election day. Actually, most of it is the day AFTER election day, East Coast time, because the big traffic rush comes during Obama’s acceptance speech, which took place after midnight Eastern Time.
We could also plot the data as cumulative traffic instead:
Now it mostly just looks like a slow and steady climb, with Romney getting somewhat more traffic up until election day, when Obama’s numbers get a gentle bump.
These three charts are in some sense showing the same data, but the immediate takeaways are quite different.
As another quick example, let’s look at a line chart of the same pageview data for 2012 senate candidates:
This looks a bit different. There are two massive spikes, and everything else is tiny by comparison. It turns out both one-hour spikes belong to Elizabeth Warren, the now-senator from Massachusetts, who spoke at the Democratic convention.
This chart seems to tell the story that Warren had two breakout moments where lots of people were looking into her online, and the rest of the Senate field was quiet (including Ted Cruz, who spoke at the Republican convention but didn’t draw nearly the same amount of traffic). But what about the little sawtoothed pile that starts around August 20?
If we try aggregating by day, as with the presidential election, we get the answer:
Oh right, that guy. When Akin made his ill-advised comments, he apparently had a lot of people run to their computers to look up who he was. But unlike Warren’s convention speech, it wasn’t a second-screen, live TV moment sort of thing. It was news that spread more gradually, over the course of about two days.
We also see that many other candidates got some attention on election day. The person with the biggest daily peak turns out not to be Warren, but rather Tammy Baldwin from Wisconsin, now the first openly gay US senator. She didn’t make waves during the campaign, but her historic election brought a bunch of curious Wikipedia viewers after the polls closed.
Had the “days” been grouped on a cutoff besides midnight Eastern Time, so that the late-night election speeches and results were grouped in with the day before, we would have seen yet another story. We could also look at total pageviews by candidate and get a different impression:
And let’s not forget that Wikipedia traffic is far from a great proxy for information-seeking behavior generally. It suffers from all kinds of biases.
So which of these charts is the accurate one? Which one tells the story? All of them? None of them?
The lesson, as usual: data does not speak for itself. It’s something you can mold into different forms, all of them “true,” none of them the whole truth. The way you slice and scale things matters. Context matters. Even something as prosaic as time zones can have a big impact on what story comes out of your work. Always think carefully about what your data is and is not telling you.
A more detailed version of the presidential pageview chart is available here.
Last weekend at the Mozilla Festival, a group of journalists sat down to solve a murder mystery on the command line.
Each person got a set of folders containing text data files full of information about the mean streets of Terminal City. The files listed who lived there, the vehicles they owned, the clubs they belonged to, the streets they lived on, and so forth. The formats were varied - some of them were tab-separated tables, some were plain text, some had instructional header or footer rows.
More importantly, 99.9% of the text was junk. It was gibberish, or excerpts from Alice in Wonderland, or names of random 2012 olympic athletes. But buried at key points in these large files were actual clues that, when followed, would eventually lead you to the identity of the murderer. With so much nonsense text to sift through, the only way to crack the case in a reasonable amount of time would be to use the command line to quickly search, filter, and inspect the data.
I thought this might be a stickier way to teach the basics of the command line than drily walking through a lot of examples, because it more closely mimics a real world data journalism scenario: you inherit a big dump of messy data without any context. There’s too much data to hold in your head, and you don’t even really know what’s in it, or how the files are structured. You have to probe and get your bearings, and then you have to be careful with your inquiries, spot checking and duplicating results as you go. You can only see one slice of the big picture at a time.
The key thing about this whodunit exercise is that it’s freeform. You don’t have instructions to follow; you have a situation, and it’s up to you to experiment and find a path to the solution, once you figure out what the solution would even look like. There are many different ways you could find the answer. Some might be more efficient but trickier to implement, others might be simple and stepwise but easier to follow and modify. This is an important part of getting comfortable with the command line: understanding that it consists of small pieces that do one thing well, and you can combine them in infinite ways to get what you need.
Why worry about teaching journalists the command line in the first place? I can think of a few reasons why it comes in handy even if you have no plans to become a developer:
A lot of really useful tools for journalists end up stranding you on the command line. You hear that piece of software X is exactly what you need to convert that weird file, or build a certain type of chart, or make a map, so you go try to download it. But you end up on a GitHub page with installation instructions that are way over your head and involve fifteen different steps on the command line.
In a perfect world you could just copy and paste the commands from the documentation and cross your fingers and hope it all works. In the real world, those tools almost never just “work,” and the documentation usually leaves out some important details. So you get some weird error message during setup, or output you weren’t expecting. You’ll be stuck unless you have some idea of how the commands are structured and what you might need to change.
Command line tools are a lot more efficient at processing text than desktop software or even custom scripts, and this starts to matter if you have a massive dataset. You can open a 5MB file in Excel, but not a 5GB one. If you’re a data journalist and you encounter a really huge quantity of data, using the command line for filtering/searching/cleaning can save you a lot of headaches.
It’s useful to stop thinking of “data” as a special category, something you only interact with delicately and indirectly, with a piece of software like Excel as your liaison. Data is text, and text is data. Virtually any sort of data a journalist encounters can be treated as just a big pile of text, and once you understand that, you can get more creative in how you interrogate and modify it, because it all boils down to searching and replacing, reading text in and spitting it back out.
As for the mystery, you can give it a try yourself (you only need the file clmystery.zip). This version was kind of a rush job, with not nearly as much hardboiled, Sam Spade flavor as I would have liked, but pretty soon I’ll start working on the next case and hopefully introduce more advanced commands like sed and awk. Get to work, gumshoes!
In a recent piece for The Atlantic, Olga Khazan argues that learning to code is a poor use of time for most aspiring journalists who could instead be using that time honing their other skills. Like many of my colleagues who have committed acts of code in a newsroom, it really rubbed me the wrong way, for two main reasons.
First, the author doesn’t seem to have done any reporting for the piece beyond a second-hand tweet and extrapolating from her personal experience. She could have picked up the phone to test her assumptions or gain outside insight. She could have asked hiring managers in newsrooms how much they actually value coding skills. She could have asked j-school faculty why they were or were not adding more technology education to their curriculum. She could have asked news developers what their experience is like working with reporters and dividing up roles on a project. She could have asked journalists-turned-coders how and why they chose to learn. Had she done any of these things, I imagine the piece would have been a lot more accurate, interesting, and constructive.
Second, and more importantly, the article falls victim to a lot of fallacies about code and journalism that keep coming back up in this whole discussion. To name a few:
Conflating learning to code, learning to make things for the web, and technological literacy
One of the most maddening parts of this debate is the way every possible thing that might involve a computer ends up lumped together under the umbrella of “coding.” Let’s introduce some nuance. Broadly, you have at least three different categories where a journalist might seek (or be nudged) to improve, and they’re only loosely related.
Technological literacy - Understanding your medium is valuable. When reporters or designers don’t have any sense of the constraints or tradeoffs in making things for the web, everybody loses. The resulting work is worse, and all sides waste a lot of time due to poor communication and mismatched expectations. I also think a lot of journalists working on web projects overestimate how neatly “technical” decisions can be isolated. Supposedly “technical” decisions tend to have real editorial and design implications, especially when they have to be made hastily on a deadline. If you can’t have an informed conversation about those decisions, you’re handing over the keys to the people who can.
Learning to code for research and analysis - Khazan talks a lot about positioning yourself to get hired and a lot less about whether technical skills might help you keep the job by actually being good at it. If you want to work on a subject like school performance, crime, government spending, or any of the countless others that involve complex data, having a technical toolset is important. A little bit of code can give you a big leg up in terms of finding, cleaning, and exploring data. If you think you can compartmentalize the “data” work and give it to someone else, or that you’re fine only reporting stories you can find browsing Excel, someone else is going to eat your lunch.
Learning to make things for the web - If I set aside my own bias as a web developer, this is probably the category I’m least sanguine about for a broad audience. I certainly think a basic working knowledge of HTML and how the web functions is necessary, but I’m willing to buy the argument that we shouldn’t send journalists who really are just looking to write too far down the web rabbit hole. This is mostly because, whereas a journalist who dabbles in using code to analyze data can get real immediate value out of a few tricks, the same is less true of the web. Once you get past the frisson of excitement you get the first time you switch from web consumer to a person who just made a real live web page, there’s a long road before you can make something complex that could go on your news organization’s website. You have to put in a lot of reps before you can wrestle with all the little gotchas of making something for public consumption on every imaginable browser and device.
That doesn’t mean I would discourage a young journalist from poking around with web technologies. Far from it. I love the web, and I happen to think it’s a lot of fun even when the stuff you’re making kind of sucks. But if it turns out not to be your idea of a good time and you want to draw the line at the basics, more power to you.
Arguing against “every journalist must learn to code”
There are some people out there who make it sound like all journalists have to become software developers. This is a silly position. And I sympathize with Khazan that the people who beat the “learn to code!” drum indiscriminately do everyone a disservice, and may even put more people off coding than they draw in. But I don’t think it’s fair to claim that “everyone” is “always” telling journalists to learn to code; arguing against that reductive version is a straw man. Last I checked, journalism schools don’t exactly suffer from a dangerous glut of technical education.
Of course “every journalist must learn to code” is a silly proposition, just as “no journalist should learn to code” is. Journalism is not a monolith. It depends on what sorts of stories you’re trying to tell and in what media. I will freely grant that some journalists have goals that won’t benefit much from technical savvy or coding skills. Khazan may be one of them. But it’s strange to take an anecdotal case and just suppose that it applies to a majority, or even a substantial minority, of young journalists.
"Serious coding is for people with computer science degrees"
I run into this assumption a lot, that people who code in newsrooms must largely be trained computer scientists. It’s really not true. Anecdotally, very few of the newsroom coders I know have a computer science background. I’m pretty sure Chris Groskopf dreams in Python, and he was a philosophy major.
But don’t take my word for it, I actually tried to gather some data on this subject since it keeps coming up. What did I find? Only 1 in 4 news developers studied computer science in school, and nearly half of them didn’t start learning to code until the end of college or later. The sample wasn’t perfect (if anything, I suspect it actually overcounts computer science majors), but it’s probably a lot closer to the mark than idle speculation.
This makes sense: if you’re the kind of person who decides to study computer science and sticks with it, you probably have a talent or affinity for the inherent puzzle-solving of programming, and will be right at home working at a software company solving hard technical problems (where you’ll make a lot more money). Coding in the newsroom tends to be less about deep technical puzzles and more about storytelling and design, and attracts people who are interested in the world and just happen to use code as a tool while they figure it out. Think MacGyver, not Edison (the web: it’s paperclips all the way down).
Treating learning to code as an all-or-nothing proposition
A lot of people have unreasonable expectations about what learning to code actually looks like. Despite what the latest crop of “teach yourself to code” hucksters will tell you, you don’t get to go from zero to web developer in 4 weeks. Learning this stuff is a long, challenging, humbling process. There may be a few people who have such a supreme aptitude for it that they glide right through and never struggle, but I have yet to meet one. The frustrations Khazan describes felt very familiar to me, as I suspect they would to any developer. Ask the most hardcore coders you know and they’ll tell you that they too get stuck and then want to tear their hair out when they realize they wasted an afternoon over one lousy semicolon.
But here’s the thing: learning to code is not all or nothing. There seems to be this sense that deciding to learn to code is a radical act of self-redefinition, that you are embarking on a dramatic journey. If you think of it this way, and you think that you have to slog through for three years before you get any value out of it, I can understand why you would look at the investment required and say “no thanks.” But it really doesn’t work like that. There’s no blood oath, I promise.
Journalists and journalism students (and journalism professors) should quit thinking about “learning to code” in the abstract. Instead, think about the stories you want to tell, and to the extent there are ways that code would help you tell them, learn what you need for the situation. Different journalists will benefit, or not benefit, in different ways. Don’t sit down with a big boring book and an online course and declare you’re going to learn Python. You’ll probably get stuck, get bored, and give up. Set out to build something you like, or explore some data you care about, and figure out what you need to learn to make that happen. And don’t go it alone; ask your developers for help, or find a community of other learners to collaborate and commiserate with.
I can pinpoint the exact moment when the awesome craziness of my OpenNews fellowship sank in. I was on my way home after my first day at BBC headquarters, looking around the subway car, and I realized that fully half of the passengers were reading BBC News on their phones. Whoa. Since then, I’ve been in the newsroom when Pope Benedict resigned, when Margaret Thatcher died, and when the bombs went off in Boston. I learn so much every day that my head is spinning by lunchtime. My seven co-fellows routinely blow my mind with their work. I’ve met so many brilliant people around the world who are not just redefining how we do the news, but doing it as a team, one big journalism Justice League. I love this job.
I had a serious case of imposter syndrome, and I know I’m not alone. Yesterday, Larry Buchanan, who is using D3 to develop awesome interactive graphics for The New Yorker like the NCAA Money Bracket, asked the Twitter hivemind for help working with some messy data. I lent a hand, and this was his response:
@veltman if only i were a real developer … Thanks again!
There is no line where you suddenly cross over from non-coder to coder, or from fake developer to real developer. There’s no high priesthood. You start learning, and then you just keep going. This is how I put it when speaking at the BBC’s recent Data Day:
The notion that code is this hyperspecialized thing, scary punctuation soup on a dark screen, something that someone else does, is wrong, and it’s toxic.
There are people all over the world who don’t consider how code might help them do their job, because they think it’s a big leap. It’s not. It’s thousands of tiny steps, and everyone takes them in a different direction. A little bit of code goes a long way.
People who do flirt with the idea of learning to code often get discouraged quickly. They get stuck, they get frustrated, and they look at the cool things that “real developers” are doing and decide that will never be them, so why bother? Well guess what? We were all that person. We are all STILL that person. We all get stuck. We’re all figuring it out as we go along. Welcome to the club.
People who are already doing great things with code are reluctant to teach others and share their work because they think it’s too basic or too sloppy to be useful to anyone else. It’s not true. Take your Code of Dorian Gray out of the attic. You have much more to teach us than you realize.
What I love most about coding in the newsroom is that the artificial divide between coders and everyone else is weak and getting weaker. Every day brilliant, passionate reporters and designers are waking up to the ways that code can help them find and tell stories, and developers are getting better at thinking as journalists. Philosophy majors are writing Rails apps and Java developers are doing investigative reporting. That blending is what makes events like NICAR and MozFest so wonderful. People with different experiences and skills come together to learn from each other and nobody gives a shit what it says on your business card. It’s not separate tribes, it’s one big family.
The newsroom is a great place to blow up this wall because we rarely get too wrapped up in code for its own sake. There are plenty of true computer scientists in the world who get their satisfaction cracking tough coding puzzles, and they don’t care whether it’s for a bank or a government or a hydroelectric dam. God bless those people—the world needs them—but I’m usually not one of them, and most of my newsroom colleagues aren’t either. We’re here because we we want to make things that teach people about the world they live in. We care about the best way to tell a story, and about what it means to our audience; we care less about whether we had the perfect algorithm under the hood. Developing on deadline will do that to you. Like Lorne Michaels said about Saturday Night Live, “the show doesn’t go on because it’s ready; it goes on because it’s 11:30.”
It’s an exciting time to be coding in a newsroom. There’s a righteous community of journocoders who are changing the game every single day. And we’re recruiting. The OpenNews program is looking for five new fellows next year. Like telling stories? Like making an impact? Like using code to do it? You’re crazy if you don’t apply. Come join the Justice League. Show us what you can do.
Today marks the halfway point in my ten-month OpenNews fellowship with the BBC Visual Journalism team in London. With five months behind me and five more to go, it’s a good time to take stock. What have I done? What have been the highlights and surprises?
A great community
The first half has been an incredible experience. I’ve worked on dozens of exciting projects, attended great events all over the US and Europe, and learned enough to fill volumes. BBC Broadcasting House is a pretty special place to come to work in the morning. But the biggest highlight, by far, has been the people.
Besides my great colleagues at the BBC, I’ve met so many incredible people at events like NICAR, the Mozilla Festival, and this week’s MIT-Knight Civic Media Conference. I always leave these events with my head spinning from so many fascinating conversations, and in between the in-person meetings, the conversation keeps going at the online water cooler. Not only is this little universe full of scary smart people doing interesting work, but they’re all incredibly generous with their time and expertise, answering questions, offering feedback, sharing hard-won wisdom. The community really is everything.
The most pleasant surprise of all has been my seven fellow fellows. When I went through the application process, I didn’t give any thought to what other people might be chosen, or what sort of relationship I’d have with them. Getting the chance to meet, learn from, and work with this amazingly talented crew has been such a treat. The same goes for the 2012 fellows and our fearless leaders, Dan Sinker and Erika Owens.
Bridging the divide
So far we’ve covered things like making maps for the web, scraping, how a web server works, performance issues, and Excel vs. databases. Every week ends up being a great, wide-ranging discussion about how different roles think about these things, conflicting priorities, and blind spots. The dirty secret? I learn way more than I teach. Leading these talks has given me all kinds of insight into how editorial and design roles tend to think about certain problems, and where the opportunities for smarter tools and processes really lie. I’m looking forward to continuing these for the rest of the year, and hoping I can even convince some designers and reporters to take the reins and lead some lunches of their own.
Back in January, I set out some of my goals for the year. For those keeping score at home, my grades look something like this:
Write something. C- I’ve posted a few things on this blog and elsewhere, so I haven’t been a total deadbeat. But there are lots of other topics I’ve meant to write about, and projects I’ve meant to document. I hope to be better about this in the second half.
Teach Something. B+ Between blog posts, speaking engagements, and the Learning Lunches, I’ve done all right in this department.
Understand the dynamics of a large news organization. C I have to grade myself on a curve here. I could work at the BBC for 30 years and still only understand a tiny fraction of its institutional logic, but considering that I had absolutely no concept of how a major newsroom worked when I started, I’ve come a long way.
Build something together with the rest of the 2013 fellows. F Total failure. Fortunately, we’ve talked about this, and plans are in the works to team up in the second half as a news development Megazord and do something awesome.
Give in to Twitter. B+ I get it now. Consider me converted.
Figure out how to explain OpenNews to my grandmother. D I think she understands that it involves news, at least.
I’ve got a long list of project ideas for the second half of the year, and I hope to do a better job of showing my work by writing about it, speaking about it, and putting a lot more code on GitHub. I also want to create something that will outlive me at the BBC, whether that’s a set of tools, a process, or even just a useful idea. At the same time, I’ll be continuing with more Learning Lunches and crisscrossing the globe to attend lots of other great events. The challenge is going to be fitting it all into just five short months.
Coming soon on this blog: what I’ve learned about news development so far, and a deeper dive into lessons from our Learning Lunches.
I’m not big on offering advice when it comes to learning how to code. Everyone learns differently and has different goals; my experience isn’t necessarily instructive. But I seem to be getting asked the same question more and more often: someone wants to be able to make cool things for the web, and they don’t know how to get started. Here are some thoughts on how to keep your head on straight while you’re trying to learn. Take them all with a big grain of salt.
Work on an idea you’re excited about.
As a learning tool, there’s nothing more powerful than having an idea that you’re genuinely excited about. There are two big reasons for this:
Learning how to code is full of exhilarating lightbulb moments, but it’s also full of hours of banging your head against the wall, not understanding why something doesn’t work or what to do next. You will get stuck. You will get frustrated. Being excited about what you’re building will help you power through those times. Rather than lose interest when you hit the wall, you’ll go above and beyond to find a solution.
You’ll care about doing it well. You’ll learn a lot more when you’re interested in the end product not just as proof that you did it, but as a project you wanted to build for a reason. You’ll think about the details, tradeoffs, and design considerations. You’ll question your assumptions. You’ll refine it over time instead of checking it off the list as soon as it satisfies the bare minimum.
Take your time.
There’s a whiff of infomercialism in the air these days, this notion that if you take the right online course or buy the right book, you can just skip ahead to being a master coder. The 8 Minute Abs version of learning to code is like the 8 Minute Abs of…well, abs. It’s an enticing fiction, the notion that as long as you’re really clever about the process and you buy the right accessories, you can skip most of the actual doing.
It’s true that it’s easier than ever to do amazing things quickly, and that you’ll have lots of bursts of insight that make it all feel quick and easy. You’ll add a few lines of code and make something great happen on the screen, and you’ll be ready to take on the world. But this is a long, gradual, and humbling learning process. There aren’t a lot of shortcuts for building up the context, the “why” behind different approaches and frameworks and the code underneath them, and that’s what will allow you to go off-script and improvise awesome stuff.
You’re not checking off a box when you learn to code. You never stop learning. But that’s part of the fun!
Don’t overload on tutorials.
Tutorials are easy to find. In five minutes of Googling, you could grab 10 of them on every coding topic you care about. But they won’t stick nearly as well as hands-on practice and experimentation. And when it comes to code, it’s much easier to read about the “how” than the “why” (in part because it’s much easier to write about the “how” than the “why”). You want healthy portions of both (one of the peculiarities of learning to code for the web: you’re constantly learning in both directions on the ladder of abstraction, learning new tricks you don’t fully understand and learning more about how your old tricks actually work).
Tutorials are great sometimes, especially when you have a well-defined task you’re trying to figure out, but use them sparingly. Get your tutorials on a just-in-time delivery system. Don’t just go on a shopping spree and expect to download all that information into your brain.
Don’t worry too much about the “right” way to do things.
As you learn to code, you’ll probably feel self-conscious about whether you’re doing things the way you’re “supposed” to. You’ll come up with some weird approach that does what you want but you’ll be certain that it’s an absurd solution and that if you were a real coder you could do it the correct way. Coders actually have a word for this situation: we call it “coding.”
There are lots of reasons it’s easy to feel self-conscious, especially if you’re just starting out:
You presuppose there’s a right answer. Finding the one unquestionably correct approach is the exception, not the rule. There are usually many valid ways to coding something for the web. This is especially true because the code under the hood is so inextricably linked to questions of design. Get used to thinking in terms of tradeoffs and what approach best balances them for your users and your goals rather than thinking in terms of right answer/wrong answer.
You feel like you aren’t a “real” coder yet. You’ll find yourself having conversations and reading documentation full of jargon and backdoor brags that make you feel you don’t belong and you should go sit at the kid’s table. People will drop in loaded words like “simply” and “just” to make time-consuming and difficult tasks sound like they should be effortless, and that if they aren’t it’s because you’re stupid. These same people know full well that most of the things they planned to “just” do ended with them spending five hours tearing their hair out wondering why it didn’t work. There are also some outright code snobs who will act like you might as well be programming on a Speak & Spell if you don’t use their preferred language or software or operating system.
I won’t bother getting into my armchair psychoanalysis of why all this happens, but you shouldn’t let it get to you, and here’s one reason why: they’re making it up as they go along too. There is no high priesthood of people who have gone through the traditional rites and received The Knowledge.
Teaching yourself to code is an idiosyncratic process, like teaching yourself to cook. You don’t suddenly cross the threshold from non-cook to cook; you learn some specific dishes and some underlying common principles, then you learn some more. As you learn new tricks you practice and master your old ones. But what exactly you end up learning to cook will depend on a lot of factors and not perfectly overlap with anyone else. In the same way, every web coder takes a very different winding road to their knowledge and ends up with a mix of mastery of some things and total ignorance of others. This is actually great, because it means we all have a lot to learn from each other.
Find a community.
Don’t be a hero and try to power through the learning process on your own, surviving only on twigs and berries and O’Reilly books. Your fellow learners are your best resource (and remember, all coders are learners). Make friends with fellow beginners, but with more experienced coders too. Go to meetups. Ask lots of questions. Get feedback. Offer feedback. Like somebody else’s work? Tell them so, and tell them why. And don’t forget to share your own work and the lessons you’ve learned. You’ll have much more to teach others than you realize.
There’s always more to learn.
Learning to make stuff for the web means going at your own pace and getting more comfortable with the fact that there will always be a lot left to learn. For every thing you master, you’ll also find out about ten other things you didn’t even know you didn’t know. And the web moves fast - even if you could learn it all, by the time you finished, so much more would be possible. So don’t get overwhelmed. Just worry about the next thing you need to learn, have some fun, and don’t be afraid to get in over your head.
Some fine print: besides being generally skeptical of my advice, you should keep my biases in mind. I’m an accidental web developer who just sort of learned along the way because I had things I wanted to make. That may not suit you. Also, learning by doing without a grand plan works pretty well for the web, but don’t assume the same is true of programming generally, especially complex or high-stakes things. If you’re serving up code to millions of users, managing a bunch of important databases, or writing software for banks or ballistic missiles, you should probably get some real computer science education and care about the “right” way to code things.