Posts tagged "daily beast"
Today we published a data story looking at how iOS devices fail to accurately correct some words such as “abortion” and “rape.” Here’s a detailed methodology on how we did that analysis.
It started back in January when we were working on our project mapping access to abortion clinics. The reporters on the project, Allison Yarrow and myself (Michael Keller) were emailing a lot about the project, which led to us typing the word “abortion” into our phones on a fairly regular basis. We noticed that iOS never autocorrected this word when we misspelled it, and when we would double-tap the word to get spelling suggestions, the correctly spelled word was never an option. We decided to look further into whether this could be repeated on iPhones with factory settings and what other words iOS doesn’t accurately correct. To do this, we decided to find a complete list of words that iOS software doesn’t accurately correct.
We did this in two stages:

Stage One: Use the iOS API’s built in spell-checker to test a list of misspelled words programmatically.


Step 1: Get a list of all the words in the English language
We combined two dictionaries for this: The built-in Mac OS X dictionary that can be found in /usr/share/dict on a Mac and the Wordnet corpus, a widely-used corpus of linguistic information, which we accessed through NLTK, a natural language processing library for Python. We left out words shorter than three characters, words in the corpus that were two words (e.g. “adrenal gland”), and words with punctuation such as dashes or periods (e.g. “after-shave”, “a.d.”). We reasoned that these words were either too short to accurately correct or had more variables to them than we would be able to test on an even playing field, so we left them out of our analysis.

Step 2: Create misspellings of these words
We wanted to test slightly misspelled versions of every word in the English language so, to start, we wrote a script that produced three misspellings of each one: one where last character was replaced with the character to the left of it on the keyboard, one where the last character was replaced with the character to the right of it on the keyboard, and a third one where the last character was replaced with a “q”. Because modern spellcheck systems know about keyboard layout, these adjacent-character misspellings should be the low-hanging fruit of corrections.
For instance, “gopher” would become “gophet,” “gophee,” and “gopheq”.

Step 3: Run these misspelled words through an iOS API spellchecker program.
Apple doesn’t have a “spellcheck program” but for iOS developers, it has an API with a function that will take in a misspelled word and return a list of suggested words in the order of how likely it thinks each suggestion is. In Xcode, the program you use to write iPhone and iPad Apps, you can use a function under the UITextChecker class, called “guessesForWordRange” which will do just that. Before testing each word, however, we ran the non-misspelled word through a function in this class called “rangeOfMisspelledWordInString” which will tell you whether the word in question exists in the iOS dictionary. This meant that we weeded out words that were in our Wordnet and Mac dictionary lists but that iOS wasn’t aware of. In other words, we only tested words that if you spelled them correctly on an iOS device, they wouldn’t get the red underline. For all of our tests we used the then-most up-to-date version of Xcode, 4.6.2, and ran the most up-to-date version of the iOS 6 Simulator.
We also tested whether the misspelled word was in the dictionary and to make sure our misspelled word wasn’t also a real word. For example, “tab” has a right-adjacency misspelling of “tan” which is also a word. In that case, the script fell back to the “q”-misspelling. So if it was testing “tan” as a mispelling for “tab” it would see that “tab” is a real world and throw “taq” at it as the misspelling. Obviously, “taq” is a harder misspelling of “tab” to correct, but we also gave it “tav”, its left adjacency misspelling. If it got either of these right we would count “tab” as a word that it can accurately correct. Later on we did many more misspelling combinations as our list got smaller to be sure we gave the spellchecker many chances to correct what should be easy corrections.

Step 4: Analyze results
If a word was accurately corrected at least once, we marked it as properly recognized by iOS. This process narrowed our list down from about 250,000 to roughly 20,000 words. There was one big problem though: the iOS spellcheck didn’t accurately correct some words that real iPhones were able to correct. For instance, the API wouldn’t correct “aruguls” to “arugula,” for some reason. Our questions to Apple on this went unanswered; if anyone has any suggestion as to why the two systems are different, please let us know.
After meeting with some New York-area iOS developer meetup groups, we found that the spellcheck on the iOS simulator as a part of Xcode does correct these edge cases, which led us to stage two.

Stage Two: Use spellcheck on the iOS simulator to check the remaining 20,000 words
To access the word suggestions on the iOS simulator, you need one crucial piece of hardware: a human hand. We were able to write an iOS program easily enough that presents a word on the simulator, but there’s no way to programmatically pull up the spellcheck suggestion menu because iOS programs don’t have scope for system level operations. To do that, you need to physically double-click the word and navigate through the various menus. 

Step 1: Find a way to automate clicking
To solve this, we got into our wayback machine and wrote an AppleScript that would move the mouse to specific coordinates on the screen, wait a specified number of milliseconds for menus to appear and then click in the appropriate places. Our iOS program had a button that, when clicked, saved the original word, the presented misspelled word, and the final result of the correction. Our AppleScript script clicked through the menus, replaced the word if the simulator presented a suggestion, then clicked the button to serve the next word. 
We tried to make this process as fast as possible but it ended up taking around 1.6 seconds per word. 1.6  multiplied by 20,000 is 32,000 seconds, which equals 8.8 hours. But we also wanted to present even more misspelling options—twelve more in total.  
We can call this Step 2, create more misspellings:
1. Double last character.
2. Double last character with a capitalized first character.
3. Missing last character.
4. Missing last character with a capitalized first character.
5. Misspelled first character (via left misspelling adjacency) and capitalized first character.
6. Misspelled first character (via left misspelling adjacency).
7. Misspelled first character (via right misspelling adjacency) and capitalized first character.
8. Misspelled first character (via right misspelling adjacency).
9. Misspelled second character (via left misspelling adjacency) and capitalized first character.
10. Misspelled second character (via left misspelling adjacency).
11. Misspelled second character (via right misspelling adjacency) and capitalized first character.
12. Misspelled second character (via right misspelling adjacency).

So, including our first misspelled last character with left/right adjacencies, we had 14 lists of 20,000 words to run through. 14 multiplied by 8.8 hours = 123.2 hours, which is five days if the program ran straight for 24 hours a day. We needed to take a break in between each of the 14 sessions, however, and restart Xcode just in case there was a learning algorithm—we didn’t want the results of one session to pollute another.
Renting computers from Amazon is easy and but not if they’re Mac OS computers, which aren’t available through Amazon and get rather expensive through other dealers. Fortunately, the Columbia School of Journalism let us take over one of their Mac computer labs and we were able to run script in parallel and finished in a much more reasonable time frame. I was also able to not have my laptop out of commission crunching words for a week. Here’s a Vine of what the automated corrections looked like: 

One drawback of this method was that we could only get the mouse simulator to select the first suggestion. So, in the scenario that for the misspelled word “abortiom”, “aborted” was suggested as more likely than “abortion,” this program would make that as an inaccurate correction. Wearen’t too worried about this, though, because 1) our iOS script in stage one *did* take into account multiple suggestions, so all the words had two chances to be corrected in that scenario, and 2) we presented 14 different misspellings of these words and if any one of these variations was correctly spelled then we counted that accurately corrected. If a word that is only off by one character isn’t suggested that many times, then something in the algorithm isn’t handling that word correctly.

Step 3: Analyze results
This second stage only cut out around 6,000 words, leaving us with 14,000 words that were never accurately corrected. The ++related article++[] lays out our findings but our initial hypothesis that “abortion” is a word that iOS doesn’t correct, unlike Android phones, held true. Apple declined to comment for this project so we have many unanswered questions. One idea for future research is whether iOS devices are incapable of learning certain words like “abortion.” That is to say, these words are blocked not just on the dictionary suggestion level, but on the machine learning level as well.

Stage Zero:  Find the files.
Before we did stage 1 we had a different strategy: find this list of seemingly banned somewhere in the iOS file structure. To do this, we put out a call on Facebook for any friends that would donate an old iPhone to be jailbroken. We got three phone: one from my mom, and two from some very nice old friends who mailed them to our offices. We factory-reset and jailbroke one and kept the others as factory-fresh for testing. We went searching and found some promising files in the LinguisticData directory called “pos” “ner” and “lemmas” which in the natural language processing world, stand for “part of speech”, “named entity recognition” and “lemmatization,” which is analyzing word stems and inflected forms like “better” being associated with “good” as its base. These files were unreadable, however, because they weren’t in any known format. The only way we could read them was in their raw binary-hex format which looks like that terrible mess of characters you see when you open a corrupted word document—like Wingdings but with less rhyme or reason.
After many attempts at deciphering where a list of blocked words could reside and reaching out to the New York iOS community, we started in earnest with reverse engineering this list ourselves with stage 1.

Today we published a data story looking at how iOS devices fail to accurately correct some words such as “abortion” and “rape.” Here’s a detailed methodology on how we did that analysis.

It started back in January when we were working on our project mapping access to abortion clinics. The reporters on the project, Allison Yarrow and myself (Michael Keller) were emailing a lot about the project, which led to us typing the word “abortion” into our phones on a fairly regular basis. We noticed that iOS never autocorrected this word when we misspelled it, and when we would double-tap the word to get spelling suggestions, the correctly spelled word was never an option. We decided to look further into whether this could be repeated on iPhones with factory settings and what other words iOS doesn’t accurately correct. To do this, we decided to find a complete list of words that iOS software doesn’t accurately correct.

We did this in two stages:

Stage One: Use the iOS API’s built in spell-checker to test a list of misspelled words programmatically.

Step 1: Get a list of all the words in the English language

We combined two dictionaries for this: The built-in Mac OS X dictionary that can be found in /usr/share/dict on a Mac and the Wordnet corpus, a widely-used corpus of linguistic information, which we accessed through NLTK, a natural language processing library for Python. We left out words shorter than three characters, words in the corpus that were two words (e.g. “adrenal gland”), and words with punctuation such as dashes or periods (e.g. “after-shave”, “a.d.”). We reasoned that these words were either too short to accurately correct or had more variables to them than we would be able to test on an even playing field, so we left them out of our analysis.

Step 2: Create misspellings of these words

We wanted to test slightly misspelled versions of every word in the English language so, to start, we wrote a script that produced three misspellings of each one: one where last character was replaced with the character to the left of it on the keyboard, one where the last character was replaced with the character to the right of it on the keyboard, and a third one where the last character was replaced with a “q”. Because modern spellcheck systems know about keyboard layout, these adjacent-character misspellings should be the low-hanging fruit of corrections.

For instance, “gopher” would become “gophet,” “gophee,” and “gopheq”.

Step 3: Run these misspelled words through an iOS API spellchecker program.

Apple doesn’t have a “spellcheck program” but for iOS developers, it has an API with a function that will take in a misspelled word and return a list of suggested words in the order of how likely it thinks each suggestion is. In Xcode, the program you use to write iPhone and iPad Apps, you can use a function under the UITextChecker class, called “guessesForWordRange” which will do just that. Before testing each word, however, we ran the non-misspelled word through a function in this class called “rangeOfMisspelledWordInString” which will tell you whether the word in question exists in the iOS dictionary. This meant that we weeded out words that were in our Wordnet and Mac dictionary lists but that iOS wasn’t aware of. In other words, we only tested words that if you spelled them correctly on an iOS device, they wouldn’t get the red underline. For all of our tests we used the then-most up-to-date version of Xcode, 4.6.2, and ran the most up-to-date version of the iOS 6 Simulator.

We also tested whether the misspelled word was in the dictionary and to make sure our misspelled word wasn’t also a real word. For example, “tab” has a right-adjacency misspelling of “tan” which is also a word. In that case, the script fell back to the “q”-misspelling. So if it was testing “tan” as a mispelling for “tab” it would see that “tab” is a real world and throw “taq” at it as the misspelling. Obviously, “taq” is a harder misspelling of “tab” to correct, but we also gave it “tav”, its left adjacency misspelling. If it got either of these right we would count “tab” as a word that it can accurately correct. Later on we did many more misspelling combinations as our list got smaller to be sure we gave the spellchecker many chances to correct what should be easy corrections.

Step 4: Analyze results

If a word was accurately corrected at least once, we marked it as properly recognized by iOS. This process narrowed our list down from about 250,000 to roughly 20,000 words. There was one big problem though: the iOS spellcheck didn’t accurately correct some words that real iPhones were able to correct. For instance, the API wouldn’t correct “aruguls” to “arugula,” for some reason. Our questions to Apple on this went unanswered; if anyone has any suggestion as to why the two systems are different, please let us know.

After meeting with some New York-area iOS developer meetup groups, we found that the spellcheck on the iOS simulator as a part of Xcode does correct these edge cases, which led us to stage two.

Stage Two: Use spellcheck on the iOS simulator to check the remaining 20,000 words

To access the word suggestions on the iOS simulator, you need one crucial piece of hardware: a human hand. We were able to write an iOS program easily enough that presents a word on the simulator, but there’s no way to programmatically pull up the spellcheck suggestion menu because iOS programs don’t have scope for system level operations. To do that, you need to physically double-click the word and navigate through the various menus. 

Step 1: Find a way to automate clicking

To solve this, we got into our wayback machine and wrote an AppleScript that would move the mouse to specific coordinates on the screen, wait a specified number of milliseconds for menus to appear and then click in the appropriate places. Our iOS program had a button that, when clicked, saved the original word, the presented misspelled word, and the final result of the correction. Our AppleScript script clicked through the menus, replaced the word if the simulator presented a suggestion, then clicked the button to serve the next word. 

We tried to make this process as fast as possible but it ended up taking around 1.6 seconds per word. 1.6  multiplied by 20,000 is 32,000 seconds, which equals 8.8 hours. But we also wanted to present even more misspelling options—twelve more in total.  

We can call this Step 2, create more misspellings:

1. Double last character.

2. Double last character with a capitalized first character.

3. Missing last character.

4. Missing last character with a capitalized first character.

5. Misspelled first character (via left misspelling adjacency) and capitalized first character.

6. Misspelled first character (via left misspelling adjacency).

7. Misspelled first character (via right misspelling adjacency) and capitalized first character.

8. Misspelled first character (via right misspelling adjacency).

9. Misspelled second character (via left misspelling adjacency) and capitalized first character.

10. Misspelled second character (via left misspelling adjacency).

11. Misspelled second character (via right misspelling adjacency) and capitalized first character.

12. Misspelled second character (via right misspelling adjacency).

So, including our first misspelled last character with left/right adjacencies, we had 14 lists of 20,000 words to run through. 14 multiplied by 8.8 hours = 123.2 hours, which is five days if the program ran straight for 24 hours a day. We needed to take a break in between each of the 14 sessions, however, and restart Xcode just in case there was a learning algorithm—we didn’t want the results of one session to pollute another.

Renting computers from Amazon is easy and but not if they’re Mac OS computers, which aren’t available through Amazon and get rather expensive through other dealers. Fortunately, the Columbia School of Journalism let us take over one of their Mac computer labs and we were able to run script in parallel and finished in a much more reasonable time frame. I was also able to not have my laptop out of commission crunching words for a week. Here’s a Vine of what the automated corrections looked like: 

One drawback of this method was that we could only get the mouse simulator to select the first suggestion. So, in the scenario that for the misspelled word “abortiom”, “aborted” was suggested as more likely than “abortion,” this program would make that as an inaccurate correction. Wearen’t too worried about this, though, because 1) our iOS script in stage one *did* take into account multiple suggestions, so all the words had two chances to be corrected in that scenario, and 2) we presented 14 different misspellings of these words and if any one of these variations was correctly spelled then we counted that accurately corrected. If a word that is only off by one character isn’t suggested that many times, then something in the algorithm isn’t handling that word correctly.

Step 3: Analyze results

This second stage only cut out around 6,000 words, leaving us with 14,000 words that were never accurately corrected. The ++related article++[] lays out our findings but our initial hypothesis that “abortion” is a word that iOS doesn’t correct, unlike Android phones, held true. Apple declined to comment for this project so we have many unanswered questions. One idea for future research is whether iOS devices are incapable of learning certain words like “abortion.” That is to say, these words are blocked not just on the dictionary suggestion level, but on the machine learning level as well.

Stage Zero:  Find the files.

Before we did stage 1 we had a different strategy: find this list of seemingly banned somewhere in the iOS file structure. To do this, we put out a call on Facebook for any friends that would donate an old iPhone to be jailbroken. We got three phone: one from my mom, and two from some very nice old friends who mailed them to our offices. We factory-reset and jailbroke one and kept the others as factory-fresh for testing. We went searching and found some promising files in the LinguisticData directory called “pos” “ner” and “lemmas” which in the natural language processing world, stand for “part of speech”, “named entity recognition” and “lemmatization,” which is analyzing word stems and inflected forms like “better” being associated with “good” as its base. These files were unreadable, however, because they weren’t in any known format. The only way we could read them was in their raw binary-hex format which looks like that terrible mess of characters you see when you open a corrupted word document—like Wingdings but with less rhyme or reason.

After many attempts at deciphering where a list of blocked words could reside and reaching out to the New York iOS community, we started in earnest with reverse engineering this list ourselves with stage 1.

Today we published our first “Daily Beast Feature.” It’s called "Death by Indifference" and, through text and videos, it tells the story of history’s fastest-spreading HIV/AIDS epidemic taking place in Russia.
The project came through a former Senior Producer Gregory Gilderman who went to Russia last year to report on the epidemic with support from the Pulitzer Center on Crisis Reporting. The black and white photos are from photographer Misha Friedman who visited clinics throughout Russia that treat people with tuberculosis and HIV/AIDS. We wanted to place the focus of the page on the video stills and photography, since they highlight the people at the heart of the story. Our design decisions focused around making those images as evocative as possible. The black background lets the woman’s expression draw out from the page. The videos brighten as you scroll to them, drawing attention on the images and text in that section. That was easy to do using the “relative mode” of the Skrollr.js library.
All that said, we think the best part of the design, besides the images, came from Bronson Stamp’s choice of the beige color for the active nav bar item. Something about fading from grey to beige really makes those section headers lift off the page. Use it with abundance: #f3f0df.
Below, our first pen and paper mock-up:

Michael Keller & Sam Schlinkert

Today we published our first “Daily Beast Feature.” It’s called "Death by Indifference" and, through text and videos, it tells the story of history’s fastest-spreading HIV/AIDS epidemic taking place in Russia.

The project came through a former Senior Producer Gregory Gilderman who went to Russia last year to report on the epidemic with support from the Pulitzer Center on Crisis Reporting. The black and white photos are from photographer Misha Friedman who visited clinics throughout Russia that treat people with tuberculosis and HIV/AIDS. We wanted to place the focus of the page on the video stills and photography, since they highlight the people at the heart of the story. Our design decisions focused around making those images as evocative as possible. The black background lets the woman’s expression draw out from the page. The videos brighten as you scroll to them, drawing attention on the images and text in that section. That was easy to do using the “relative mode” of the Skrollr.js library.

All that said, we think the best part of the design, besides the images, came from Bronson Stamp’s choice of the beige color for the active nav bar item. Something about fading from grey to beige really makes those section headers lift off the page. Use it with abundance: #f3f0df.

Below, our first pen and paper mock-up:

Michael Keller & Sam Schlinkert

Last month we published a package of stories marking the fortieth anniversary of the Roe v. Wade decision. It had a few moving parts but I’ll just go over some of them briefly here.
How it started
This summer you probably heard the story about the last abortion clinic in Mississippi that was threatened to close due to stricter state laws. Allison Yarrow, who sat across from me at the time, was covering the story and it got us thinking: the line “The Last Abortion Clinic in Mississippi” is attention grabbing, but it doesn’t tell the whole story. That is to say, what you really want to know is how far are people away from their nearest clinic, regardless of state boundaries. One state may have five clinics but if they’re all in the southwest corner of the state and you live in the northeast corner, and your adjoining states have multiple clinics but only at their borders farthest from you, then you’ll have a hard time getting to a clinic, even if you had many in your state. To see where this might be the case and where access to services was compounded by new restrictive provisions (over 150 nationally in the past two years) we made as close to a comprehensive database as possible of every abortion clinic. Our goal was to see what parts of the country were farthest from a clinic. From start to finish, this process took about six months. 
We got our address data from a variety of publicly available sources: Planned Parenthood, the National Abortion Federation, anti-abortion websites that keep their own lists and others. We needed to verify that the address information was correct, though, so we called over 750 clinics to confirm. We also asked them up to how many weeks they offer services. The resulting database is the only one of its kind that we know of. The Guttmacher Institute undertook an abortion provider census in 2008 but they didn’t separate clinics from hospitals from private doctors offices, which represent different levels of care that we thought was an important distinction.
What it became
We started this in July and the project evolved. We thought the election might bring the issue of abortion access to the fore but it didn’t and that gave us more time. Allison brought up the fortieth anniversary of Roe v. Wade and that let us think much bigger about the project. Because this was such a personal subject matter, we knew readers’ comments would feature prominently (from both sides of the issue) and we wanted a strong narrative component, too.
To give a human voice to the Geography of Abortion Access map, Allison flew to Wichita, Kansas, one of the areas that stood out both on our map, as a metro area far from a clinic, as well as in recent memory as the site of the 2008 murder of late-term abortion provider George Tiller. To add a broader perspective, Sam Register who runs the Newsweek Archivist tumblr went through the Newsweek archives so people could follow the topic’s coverage from the 70s through the 00s.
What we learned from reader’s stories
Over the course of the week, we shifted the question we were asking from why do you support or oppose legal abortion to a conversation about pro-life and pro-choice labels as a way to get more nuanced opinions and show the complexity of the issue. We asked readers to complete either the phrase “I’m pro-life but…” or “I’m pro-choice but…” We got more responses from our other reader-based projects but we were happy in how thoughtful and honest people were. Read our roundup of interesting responses to those questions as well as our free form “Tell us your story” prompt here.
Under the hood on the map
How to represent this dataset was tricky. We had three main issues: anonymity, unbiased geography, and context. 
Anonymity: Although we got our data from publicly available websites that anyone could find and was often information that anti-abortion groups already held, we weren’t comfortable publishing addresses, names, or exact latitudes and longitudes. We took great care to do things like scrub our final database of anything identifiable and we partially randomized each clinic’s location so they weren’t pinpoint-able from our map. On the presentation level, we added the magenta circle big enough to span multiple hexagons (our base geography layer) to let people know that an address was approximate. Even if you backtrack and find our database, you won’t get any information that would let you de-anonymize the data.
Unbiased geography: As I wrote above, we wanted to get away from the arbitrary state and county borders that most all of the research we encountered was based on. We did some initial plots using Census tracts but that presents exactly the same problem [photo]. We ended up making a hexagonal grid using the Repeating Shapes plugin for ArcMap, which lets you make a grid out of your choice of shape and size. The trick to making a hexagonal grid for the web so that the hexagons will be regular (all sides equal) no matter what degree of latitude they fall on is to make the grid in your output projection, Web Mercator EPSG: 3857. You can reproject it to do your analysis in whatever you like, but because it will eventually be displayed in Web Mercator, it will need to be created in that so as not to come out distorted in the browser. If you want a 20,000 meter in diameter hexagonal grid, here’s the one we used:  Shapefile, KML, GeoJSON.
And here’s another one that Brian Abelson, current Knight-Mozilla Fellow at the New York Times, made while he was helping out on the project. They are also 20,000 meter hex grids. This one has the state borders preserved in case you want to assign state values to each hexagon: Shapefile, KML, GeoJSON.
Context: Generating our distance map wasn’t enough to tell a story with. We added three other pieces of information that would walk people through the significance of the patterns they were seeing. The first was a map of female population aged 15-44 so that people could see the areas where women lived that were farthest away from clinics and identify significant metro areas (the pink dot density overlay). The second was the different legal restrictions that each area was subject to (areas with highlighted transparency). Again, this was an interesting way to visualize this data because we didn’t highlight every hexagon in Kansas, for example, to show that certain laws were applicable in Kansas. Instead, we highlighted hexagons whose closest clinic was in Kansas. This gave us a very realistic map so that people could see what state laws they would be subject to if their nearest clinic was across state lines. It also visually demonstrates how state laws can affect people that don’t live in that state. And third, we selected our own highlights from going through the data, such as the areas where telemedicine is banned in conjunction with mandatory in-person counseling. The combination of these laws in Arizona, for instance, means some women travel over a hundred miles and spend two days to get a prescription for the abortion bill. 
More under the hood
The map itself we built using CartoDB, which allowed us to very flexibly add the different highlighted views of the map without rebaking our tiles each time.The slider that shows clinics that only offer services up to X weeks we did by loading four tile layers on top of each other at once and show/hiding them depending on the slider value. This made the map slightly slower on initial load but it made the transitions between map states super fast — so a trade-off. 
For the highlighted states, those restyle and reload all four map layers as well. We used Leaflet.js’s ability to plot vectors to draw the line between the hexagon you’re hovering over and the closest clinic to provide some more descriptive interaction.
The heatmap was created through ArcGIS from census tract data. We filtered for just the number of women of reproductive age, 15 to 44, per tract and then used the Create Random Points function in ArcGIS to create one point for every 210 women. We came up with the 210:1 ratio by looking at a histogram of the data to see what would be an accurate dividing point. For a shameless plug, I used an online tool that I made called www.Histagram.me to generate quick, interactive histograms. Feel free to use it too.
Because the heatmap itself is done with CartoCSS layering techniques and not a statistically calculated heatmap, we made sure to compare side-by-side with a choropleth tracts map of the same data using Jenks-clustered color breaks to make sure that our heatmap told the same story as the choropleth. 
A few months ago we spoke with Andrew Hill, Senior Scientist at Vizzuality (who makes CartoDB) on some experimental ways to map the data. The line on hover came out of some of his renderings and you can see in the photos below some of the experimental line styles.
All in all it was a lot of team work, Allison, Abby, Brian, Caitlin, Lizzie, Sam and a number of other people all helped with parts of it over the course of six months. If you have any other questions about it, let me know at michael.keller@newsweekdailybeast.com
-Michael
Before we settled on the Value-by-alpha approach for showing the different state laws, some failures:
We tried outlining the different shapes and showing them in different colors:




We tried coloring the hexagon outline by the different laws that were in effect. Creating a sensical hierarchy proved difficult:


Lines instead of hexagons:
Highlighting Peurto Rico:

A value-by-alpha chart where census tracts are shaded by their percentage of women of reproductive age. Unfortunately, it’s not that intelligible and the heat map overlay is a much cleaner way of showing this relationship:

Before we made the hexagon grid, how the map looks if you use census tracts:

Last month we published a package of stories marking the fortieth anniversary of the Roe v. Wade decision. It had a few moving parts but I’ll just go over some of them briefly here.

How it started

This summer you probably heard the story about the last abortion clinic in Mississippi that was threatened to close due to stricter state laws. Allison Yarrow, who sat across from me at the time, was covering the story and it got us thinking: the line “The Last Abortion Clinic in Mississippi” is attention grabbing, but it doesn’t tell the whole story. That is to say, what you really want to know is how far are people away from their nearest clinic, regardless of state boundaries. One state may have five clinics but if they’re all in the southwest corner of the state and you live in the northeast corner, and your adjoining states have multiple clinics but only at their borders farthest from you, then you’ll have a hard time getting to a clinic, even if you had many in your state. To see where this might be the case and where access to services was compounded by new restrictive provisions (over 150 nationally in the past two years) we made as close to a comprehensive database as possible of every abortion clinic. Our goal was to see what parts of the country were farthest from a clinic. From start to finish, this process took about six months. 

We got our address data from a variety of publicly available sources: Planned Parenthood, the National Abortion Federation, anti-abortion websites that keep their own lists and others. We needed to verify that the address information was correct, though, so we called over 750 clinics to confirm. We also asked them up to how many weeks they offer services. The resulting database is the only one of its kind that we know of. The Guttmacher Institute undertook an abortion provider census in 2008 but they didn’t separate clinics from hospitals from private doctors offices, which represent different levels of care that we thought was an important distinction.

What it became

We started this in July and the project evolved. We thought the election might bring the issue of abortion access to the fore but it didn’t and that gave us more time. Allison brought up the fortieth anniversary of Roe v. Wade and that let us think much bigger about the project. Because this was such a personal subject matter, we knew readers’ comments would feature prominently (from both sides of the issue) and we wanted a strong narrative component, too.

To give a human voice to the Geography of Abortion Access map, Allison flew to Wichita, Kansas, one of the areas that stood out both on our map, as a metro area far from a clinic, as well as in recent memory as the site of the 2008 murder of late-term abortion provider George Tiller. To add a broader perspective, Sam Register who runs the Newsweek Archivist tumblr went through the Newsweek archives so people could follow the topic’s coverage from the 70s through the 00s.

What we learned from reader’s stories

Over the course of the week, we shifted the question we were asking from why do you support or oppose legal abortion to a conversation about pro-life and pro-choice labels as a way to get more nuanced opinions and show the complexity of the issue. We asked readers to complete either the phrase “I’m pro-life but…” or “I’m pro-choice but…” We got more responses from our other reader-based projects but we were happy in how thoughtful and honest people were. Read our roundup of interesting responses to those questions as well as our free form “Tell us your story” prompt here.

Under the hood on the map

How to represent this dataset was tricky. We had three main issues: anonymity, unbiased geography, and context. 

Anonymity: Although we got our data from publicly available websites that anyone could find and was often information that anti-abortion groups already held, we weren’t comfortable publishing addresses, names, or exact latitudes and longitudes. We took great care to do things like scrub our final database of anything identifiable and we partially randomized each clinic’s location so they weren’t pinpoint-able from our map. On the presentation level, we added the magenta circle big enough to span multiple hexagons (our base geography layer) to let people know that an address was approximate. Even if you backtrack and find our database, you won’t get any information that would let you de-anonymize the data.

Unbiased geography: As I wrote above, we wanted to get away from the arbitrary state and county borders that most all of the research we encountered was based on. We did some initial plots using Census tracts but that presents exactly the same problem [photo]. We ended up making a hexagonal grid using the Repeating Shapes plugin for ArcMap, which lets you make a grid out of your choice of shape and size. The trick to making a hexagonal grid for the web so that the hexagons will be regular (all sides equal) no matter what degree of latitude they fall on is to make the grid in your output projection, Web Mercator EPSG: 3857. You can reproject it to do your analysis in whatever you like, but because it will eventually be displayed in Web Mercator, it will need to be created in that so as not to come out distorted in the browser. If you want a 20,000 meter in diameter hexagonal grid, here’s the one we used:  ShapefileKML, GeoJSON.

And here’s another one that Brian Abelson, current Knight-Mozilla Fellow at the New York Times, made while he was helping out on the project. They are also 20,000 meter hex grids. This one has the state borders preserved in case you want to assign state values to each hexagon: Shapefile, KML, GeoJSON.

Context: Generating our distance map wasn’t enough to tell a story with. We added three other pieces of information that would walk people through the significance of the patterns they were seeing. The first was a map of female population aged 15-44 so that people could see the areas where women lived that were farthest away from clinics and identify significant metro areas (the pink dot density overlay). The second was the different legal restrictions that each area was subject to (areas with highlighted transparency). Again, this was an interesting way to visualize this data because we didn’t highlight every hexagon in Kansas, for example, to show that certain laws were applicable in Kansas. Instead, we highlighted hexagons whose closest clinic was in Kansas. This gave us a very realistic map so that people could see what state laws they would be subject to if their nearest clinic was across state lines. It also visually demonstrates how state laws can affect people that don’t live in that state. And third, we selected our own highlights from going through the data, such as the areas where telemedicine is banned in conjunction with mandatory in-person counseling. The combination of these laws in Arizona, for instance, means some women travel over a hundred miles and spend two days to get a prescription for the abortion bill. 

More under the hood

The map itself we built using CartoDB, which allowed us to very flexibly add the different highlighted views of the map without rebaking our tiles each time.The slider that shows clinics that only offer services up to X weeks we did by loading four tile layers on top of each other at once and show/hiding them depending on the slider value. This made the map slightly slower on initial load but it made the transitions between map states super fast — so a trade-off. 

For the highlighted states, those restyle and reload all four map layers as well. We used Leaflet.js’s ability to plot vectors to draw the line between the hexagon you’re hovering over and the closest clinic to provide some more descriptive interaction.

The heatmap was created through ArcGIS from census tract data. We filtered for just the number of women of reproductive age, 15 to 44, per tract and then used the Create Random Points function in ArcGIS to create one point for every 210 women. We came up with the 210:1 ratio by looking at a histogram of the data to see what would be an accurate dividing point. For a shameless plug, I used an online tool that I made called www.Histagram.me to generate quick, interactive histograms. Feel free to use it too.

Because the heatmap itself is done with CartoCSS layering techniques and not a statistically calculated heatmap, we made sure to compare side-by-side with a choropleth tracts map of the same data using Jenks-clustered color breaks to make sure that our heatmap told the same story as the choropleth. 

A few months ago we spoke with Andrew Hill, Senior Scientist at Vizzuality (who makes CartoDB) on some experimental ways to map the data. The line on hover came out of some of his renderings and you can see in the photos below some of the experimental line styles.

All in all it was a lot of team work, Allison, Abby, Brian, Caitlin, Lizzie, Sam and a number of other people all helped with parts of it over the course of six months. If you have any other questions about it, let me know at michael.keller@newsweekdailybeast.com

-Michael

Before we settled on the Value-by-alpha approach for showing the different state laws, some failures:

We tried outlining the different shapes and showing them in different colors:

image

We tried coloring the hexagon outline by the different laws that were in effect. Creating a sensical hierarchy proved difficult:

Lines instead of hexagons:

Highlighting Peurto Rico:

A value-by-alpha chart where census tracts are shaded by their percentage of women of reproductive age. Unfortunately, it’s not that intelligible and the heat map overlay is a much cleaner way of showing this relationship:

Before we made the hexagon grid, how the map looks if you use census tracts:

UPDATE: FEB 10 @RepsGunTweets has been changed to @YourRepsOnGuns. Check out www.ThisIsYourRepOnGuns.com for the ongoing project.

Brian Abelson is a data scientist who is graciously donating his time at NewsBeast Labs before he starts a full-time position as a Knight-Mozilla Open News Fellow at the New York Times in February.
For an upcoming project on the gun debate, we’ve been monitoring statements representatives have made on the topic. As President Obama prepared to unveil his proposal for gun control on Wednesday, Michael and I were curious to see the reactions of representatives to the highly publicized announcement and be able to report that in real-time. Given the degree to which breaking news is now reported (and responded to) on social media, we thought it would be useful to build a bot to log officials’ comments on certain issues and present them in real time. Such a tool could be used by news rooms to engage their readers on a continuous basis by aggregating and serving content from members of particular communities or who serve on different committees.
@RepsGunTweets was born.
We were inspired by the work of 2013 Mozilla-Knight OpenNews fellows who recently built a prototpe for an app called “if (this) then news,” a news-oriented take on IFTTT – a site for linking triggers from gmail, twitter, dropbox, and other services to actions on the web. Applying this logic to news coverage, the fellows created the shell for a tool that would monitor live data streams, detect important events, and issue notifications. As Vice President Biden took the mic, we started furiously coding up a bot that would follow the twitter accounts of US Representatives and retweet any comment that included “gun”, “assault weapon”, “firearm”, or other relvant keywords. After a couple hours of missteps and headaches, we eventually got @RepsGunTweets up and running. In the last ten days, the bot has logged 307 tweets; two-thirds of which came in the first three days. We’re still analyzing the conversation but one interesting observation is representatives who are not in favor of gun control tend to link to longer explanations of their position on their website instead of tweet a comment.
Under the hood
At its core a retweet bot is a pretty simple tool: Follow a feed, find what matters, and serve it back up under a single account. The harder part is figuring out how to accurately communicate with Twitter’s API. Using tweepy for python we were able to easily access twitter’s numerous methods. All we needed to provide it with were the the consumer key, consumer secret, access token, and access token secret for an application generated on http://dev.twitter.com/apps. The bot follows CSPAN’s member of congress list and applies a regular expression for the desired keywords and retweets any matches.For even more technical info, check out this Github page


UPDATE: FEB 10 @RepsGunTweets has been changed to @YourRepsOnGuns. Check out www.ThisIsYourRepOnGuns.com for the ongoing project.

Brian Abelson is a data scientist who is graciously donating his time at NewsBeast Labs before he starts a full-time position as a Knight-Mozilla Open News Fellow at the New York Times in February.

For an upcoming project on the gun debate, we’ve been monitoring statements representatives have made on the topic. As President Obama prepared to unveil his proposal for gun control on Wednesday, Michael and I were curious to see the reactions of representatives to the highly publicized announcement and be able to report that in real-time. Given the degree to which breaking news is now reported (and responded to) on social media, we thought it would be useful to build a bot to log officials’ comments on certain issues and present them in real time. Such a tool could be used by news rooms to engage their readers on a continuous basis by aggregating and serving content from members of particular communities or who serve on different committees.

@RepsGunTweets was born.

We were inspired by the work of 2013 Mozilla-Knight OpenNews fellows who recently built a prototpe for an app called “if (this) then news,” a news-oriented take on IFTTT – a site for linking triggers from gmail, twitter, dropbox, and other services to actions on the web. Applying this logic to news coverage, the fellows created the shell for a tool that would monitor live data streams, detect important events, and issue notifications. As Vice President Biden took the mic, we started furiously coding up a bot that would follow the twitter accounts of US Representatives and retweet any comment that included “gun”, “assault weapon”, “firearm”, or other relvant keywords. After a couple hours of missteps and headaches, we eventually got @RepsGunTweets up and running. In the last ten days, the bot has logged 307 tweets; two-thirds of which came in the first three days. We’re still analyzing the conversation but one interesting observation is representatives who are not in favor of gun control tend to link to longer explanations of their position on their website instead of tweet a comment.

Under the hood

At its core a retweet bot is a pretty simple tool: Follow a feed, find what matters, and serve it back up under a single account. The harder part is figuring out how to accurately communicate with Twitter’s API. Using tweepy for python we were able to easily access twitter’s numerous methods. All we needed to provide it with were the the consumer key, consumer secret, access token, and access token secret for an application generated on http://dev.twitter.com/apps. The bot follows CSPAN’s member of congress list and applies a regular expression for the desired keywords and retweets any matches.For even more technical info, check out this Github page

Six Months in Review

NewsBeast Labs is roughly six months old and we’ve had a lot of fun. This tumblr has most of our projects for the past few months but there are a bunch from before our launch. Here’s a rough list of projects we’ve done so far.

Legal Experts Decode the Supreme Court’s Obamacare Ruling - Our very first project! We launched it the day we got DocumentCloud, which was also the morning of the Supreme Court ruling on Obamacare. We asked two law professors to make margin notes in the text of the ruling as they were reading it for the first time. Readers could follow along and read experts’ reactions as the conversation was happening.

Digital 100: Who’s Following Whom? - A network visualization of how Newsweek’s list of influential people in the digital space interact with each other on Twitter. 

Obamacare: It’s Cheaper! - I like to call these “Story Visualizations” - visual presentations of stories that could run as a list or as text, but are much more interesting visually. Matt DeLuca and I did a side-by-side on how Obamacare would affect different age groups’ healthcare spending.

2012 Olympics: The Latest Medal Tally - We had a live-updating Olympic Medal Count, (with a snazzy sortable table that I’ve written about before) that I worked on with our awesome intern Sarah Hedgecock. We also did a version of it for our right rail (sidebar).

Interactive Map: London’s Olympic Transformation - The Olympic Park rose from the rust. Sarah and I also did a satellite view before and after interactive that included a bunch of info on the star-chitect buildings.

Interactive Map: The U.S. Shooting Epidemic - Following the Aurora shooting, Brian Abelson and I made an interactive map of multiple-victim shootings since 2005 and asked readers to respond with their memories. We published a selection of the reader responses here. The full spreadsheet list is here.

As Income Inequality Widens, Rich Presidential Candidates Dominate - Lauren Streib and I worked on a chart (she did all the numbers), showing presidential income over the years. I remember this one chart taking four hours from start to finish for some reason…

Big Guns Inside the National Rifle Association Leadership - Who’s leading the NRA? I worked on a project with three colleagues Caitlin Dickson, Eliza Shapiro and Kevin Fallon on the NRA’s leadership. They dug through 990 forms and put together small profiles of the people at the top. We put it together in mosaic-style presentation. Normally this type of story would be a gallery format but since it’s not picture-based, we decided to create something more conducive to reading a lot of text.

SuperPAC App Election Ad Interactive - We partnered with the Super PAC App, an iPhone app that would identify political advertisements on TV and give you information about that group, such as how much money it was spending this election and articles about them. We made a web interface to their data to provide readers with more context for outside spending groups.

Interactive Map: Who’s Protesting Where? - When the Middle East erupted in protests in response to an anti-Muslim video uploaded to YouTube, Eliza Shapiro and I put together a visual guide with information on each protest as well as contextual information on each country. It was an interesting map to built since we had both point and polygon layers to deal with for hover states. As with all of our interactive maps, we used CartoDB.

Obama and Romney’s Bundlers - If bundlers had baseball cards, this is what they’d look like.We took a look at the biggest bundlers for each candidate. Collect ‘em all.

The Rise of the Political Non-Profit - How so-called “Dark Money” was influencing the 2012 election was one of the themes in a three-part series John Avlon and I wrote called the Super PAC Economy. This animated timeline overlays non-profit political expenditures and significant court decisions (Citizens United and lesser-known decisions) that determined what role these groups could play in politics.

The Dark Money Shuffle - Also in that series, we worked with Robert Maguire of the Center for Responsive Politics who had been compiling a database of grants that non-profits gave to each other. For the first time, we diagrammed this opaque world of money transfers that is only visible by manually going through hundreds of IRS forms. Full article

Election Right Rail - Showing the latest polls from battleground states, how those states voted historically, median income, and latest unemployment figures, our politics sidebar was full of context. It no longer lives anywhere on our site but you can see a standalone version how it looked on the eve of the election through the linked title.

Note: We did all of these projects before starting this tumblr. You’ll find write-ups for the projects that follow but if you want to know how we built any of the stuff above, send me a message at @mhkeller.

Debate Dashboard and Bingo - Brian, Sam, Vitaly Korenkov (one of our awesome developers) conceived of a great debate night dashboard. We had a livestream, a live chat with our commentators and a poll from Urtak, which is a polling platform that lets you pose simple yes/no/maybe questions to readers. It also lets readers submit questions they want other people to answer so it’s a good back and forth between questions we’re interested in and what our audience is interested in. We’re often into giving our readers a voice on the site so we liked it a lot. I came in during the last few hours before we were going to go live (a.ka. after all the hard work was done) and added a bingo card. The coolest part about it is the Bingo validation. The card checks how many you have in a row vertically, horizontally, and diagonally and tells you how many you need to win. NewsBeast Labs post.

Ground game: Obama Campaign Opens Up Big Lead in Field Offices - The airwave battle was being covered left and right, but we wanted to know what was happening on the ground. We scraped the two campaigns’ websites to map out their local HQs nationwide and found a big discrepancy between the two camps. In Ohio, for instance, Obama had a presence in so many counties where Romney didn’t that 10 percent of the state’s population lived in a county where the only volunteer center was an Obama HQ.

Technical note: We used CartoDB again for this map and it was a huge help. In the accompanying article, we ran interactive maps of Florida, Ohio, and Virginia. These separate maps required no real extra programming or map making since CartoDB builds your map by querying a database. By setting our map query to ‘SELECT * from national_map WHERE state = FL’  we had a local map in minutes that we could swap out for another state if needed, which indeed ended up happening. NewsBeast Labs post.

Interactive Hate: The Great Obama-Loathing Canon - Matt DeLuca and I teamed up again to solve the perennial problem of how do you present a lot of information to the reader in a way they can digest in bites that make sense. This time, we presented over a hundred anti-Obama books in a mosaic that you can filter down to different subject matters. NewsBeast Labs Post.

HavingTroubleVoting.com - We did an experiment on election day asking our readers, or anyone really, if they were having trouble voting, and if so, what kind of trouble. We plotted the responses on a map below and color-coded the markers based on the type of problem. We partnered with Mother Jones on it to help us go through the responses to find patterns and to contact people to tell their story. Our own reporters used the database in stories about massive lines and machine malfunctions. We’re totally honored and floored when CJR named it No. 2 in their Must-Read Interactives of 2012! More about it in our NewsBeast Labs post.

Election Night Interactive Map and Dashboard - A lot of teamwork went into our election night coverage from the development team, social, design… the list goes on. We took over our home page on election night with video commentary, a live updating tally, a live chat, article updates and more things that you could probably put a “live” prefix in front of. The map lives on in the linked title, a screenshot lives in our NewsBeast Labs post about it.

‘It Was Like a War Zone’: Hurricane-Ravaged Staten Island Reels - In the wake of the trauma caused by Hurricane Sandy, we did a map of Staten Island victims. It shows how many of the fatal tragedies were concentrated on the east side of the island.

Not-So-Super PACs: 2012’s Winners and Losers - DeLuca and I teamed up again to produce this tally of who made good investments this election cycle. There’s a long post about it, including some failed versions in our NewsBeast Labs post

Interactive Holiday Gift Guide - Lizzie Crocker, Isabel Wilkinson and I help you find out what sub-culture your friends might belong to in this gift guide flow chart.

Own a Gun? Tell Us Why? - December brought another terrible shooting and has caused much thought over the state of gun laws. We wanted to hear from rational people on both sides of the debate by lettings readers complete the sentences, “I own a gun because…” or “I don’t own a gun because…”. In three days, we had over 1,300 responses that represented very civil remarks from each group, for the most part. We analyzed the responses and did a state-by-state breakdown of the common themes. We used some interesting algorithmic clustering to find these patterns so expect a write-up soon. For now, read the post on how the project was born and how we collected the responses.

For the holidays, we wanted to make the normal gift guide idea a little more interesting. The answer was a flow chart that narrows down the type of person you’re shopping for and then suggests appropriate gifts for their particular sub-culture. Lizzie Crocker and Isabel Wilkinson did a great job coming up with the categories like (Nostalgic Outdoorsman, The Closet ‘50 Shades’ Fan, Hipster Techie) and then Lizzie and I (Michael) put our heads together to make a (hopefully) witty flowchart. Our photo department headed by Marcia Allert was also a huge help.
There’s nothing fancy under the hood. The only semi-trick was our Daily Beast font family Titling isn’t always too legible at smaller sizes on the web. To fix that, we made the blue circles in Illustrator and exported them as PNGs. To do the hovers, you duplicate the image below your main image like this and make sure your image container is only tall enough to show one at a time: 

To do the hover then, your CSS is something like .img-class:hover{ background-position: 0 -102px;} The benefit to this is you don’t load a second picture when you hover so there’s no delay. This is a pretty standard way of doing this so it’s nothing revolutionary but for some reason you still see a lot of sites that have delays on their image hovers that would be better off using this technique.
Why not make the flowchart interactive too?
We decided in favor of a static image for the flowchart as opposed to something interactive since I think the wittiness of flowcharts comes across in seeing how the possibilities flow from one another and the different results in choosing one adventure over another. Also, at the sake of sounding blasphemous, interactivity can be thought of as a last resort only when you can’t fit everything onto the screen at once, or doing so would work against a focused narrative. It’s much easier for readers to scan a page full of information and see how these options unfold, in my view, than it is for them to click 10+ times and only see the options that stem from their responses. 
To continue the trend to show some shots from our failures, here’s a screenshot of the drafting process for the flowchart. We really wanted to work mud into it somewhere since it could have split off nicely with Adrenaline Junkie or Wee Ones (since kids love mud), but it didn’t quite work out. The second one is what we settled on.




-michael

For the holidays, we wanted to make the normal gift guide idea a little more interesting. The answer was a flow chart that narrows down the type of person you’re shopping for and then suggests appropriate gifts for their particular sub-culture. Lizzie Crocker and Isabel Wilkinson did a great job coming up with the categories like (Nostalgic Outdoorsman, The Closet ‘50 Shades’ Fan, Hipster Techie) and then Lizzie and I (Michael) put our heads together to make a (hopefully) witty flowchart. Our photo department headed by Marcia Allert was also a huge help.

There’s nothing fancy under the hood. The only semi-trick was our Daily Beast font family Titling isn’t always too legible at smaller sizes on the web. To fix that, we made the blue circles in Illustrator and exported them as PNGs. To do the hovers, you duplicate the image below your main image like this and make sure your image container is only tall enough to show one at a time: 

image

To do the hover then, your CSS is something like .img-class:hover{ background-position: 0 -102px;} The benefit to this is you don’t load a second picture when you hover so there’s no delay. This is a pretty standard way of doing this so it’s nothing revolutionary but for some reason you still see a lot of sites that have delays on their image hovers that would be better off using this technique.

Why not make the flowchart interactive too?

We decided in favor of a static image for the flowchart as opposed to something interactive since I think the wittiness of flowcharts comes across in seeing how the possibilities flow from one another and the different results in choosing one adventure over another. Also, at the sake of sounding blasphemous, interactivity can be thought of as a last resort only when you can’t fit everything onto the screen at once, or doing so would work against a focused narrative. It’s much easier for readers to scan a page full of information and see how these options unfold, in my view, than it is for them to click 10+ times and only see the options that stem from their responses. 

To continue the trend to show some shots from our failures, here’s a screenshot of the drafting process for the flowchart. We really wanted to work mud into it somewhere since it could have split off nicely with Adrenaline Junkie or Wee Ones (since kids love mud), but it didn’t quite work out. The second one is what we settled on.

Gift Guide Failure

Gift Guide Final

-michael

A defining characteristic of this election cycle was Super PACs and the hundreds of millions of dollars outside groups were spending to influence races. Now that it’s all over, we wanted to see which outside groups spent their money on succssful races and which did not. The result was our interactive Not-So-Super PACs: 2012’s Winners and Losers.
Super PACs abounded this cycle. So instead of trying to document and display all of them, we focused the narrative on how well the biggest spenders and their donors fared. To execute it, we used Center for Responsive Politics anaylsis of FEC data to find how much each PAC had spent so far in each race and then manually went through and coded each race whether the outcome was in line with or against the PAC’s interest. Then we added everything up.
Visualizing it 
This idea went through a few iterations before settling on what you see above. For a while, we’ve been wanting to use a tower graphic template - one of those vertical scroll layouts with a sticky table of contents - that I built a couple of months ago but it never seems to work out. This time, after thinking about all of the detail we wanted to display we thought bigger.
If you’re trying to visualize money flows, Sankey lines are a go-to. ProPublica did a great one showing overlapping Super PAC expenditures and you see them as flat graphics too. They show direcionality and volume = great for money.
Getting the right data
Money was flowing from donors to PACs and then to races, so we used the JSON structure that D3 lays out for its network layouts (and Sankey) visualizations. You have a list of nodes (People and PACs) and a list of node to node links (X person gave $Y to Z PAC). We were working collaboratively in Google Docs so were able to do some formulas that would print out or data structure in JSON as we were editing the document. Very handy in case you need to correct any numbers or name spellings.
Our D3 visualization was a failure.
Here’s a link to the interactive version (yes, it’s in the “failures” folder). As you can see, there were too many races to fit on the screen and the dollar amounts in some races were so high that they dwarfed everything else. So showing each race in the Sankey was out. 
This led to Sankey Idea #2.

We connected photos of the donors to the PACs, showed the percentage of succesful funds, and then put the races in a table down below. The photos were very useful because you can quickly understand that money is coming from a person and going somewhere. If we just had text, I think, without photos, it would be less clear and have less of a personality. Someone remarked that the lines almost form bodies and arms that reach out to touch Super PACs. It’s interesting to see visualized data combined with photography work out to tell a story like that.
Under the hood
We used Raphael to draw the lines, which was an improvement from D3 since we do indeed support Internet Explorer 8. We tweaked Al Shaw’s Sankey line from Tom Counsell’s Sankey library to make them span vertically instead of left to right. Here’s a jsFiddle of the code to draw the line.
The table uses Isotope.js for its animated sorting, which is snazzy but I also think does help make tabular data more understandable. Instead of clicking on a column header and everything resorts in a flash, you can see how dramatically different rows vary from view to view. It’s also nice because you can do filtering. So without much code you have a filterable, sortable table. It also saves a step of turning the object data into arrays for sorting. I’ve been wanting to add those ascending / descending arrows for a while to our tables so this was a good time for that.
This table will probably become our first stand-alone NewsBeast Labs plugin since we’ve been using it pretty frequently. That’s pretty cool because five months ago we didn’t have any interactive news code and now we’ve done enough projects that we can see what’s worked, what functionality we like and can wrap it all up into something more robust and reusable, which will make our future development that much faster. 
-michael

A defining characteristic of this election cycle was Super PACs and the hundreds of millions of dollars outside groups were spending to influence races. Now that it’s all over, we wanted to see which outside groups spent their money on succssful races and which did not. The result was our interactive Not-So-Super PACs: 2012’s Winners and Losers.

Super PACs abounded this cycle. So instead of trying to document and display all of them, we focused the narrative on how well the biggest spenders and their donors fared. To execute it, we used Center for Responsive Politics anaylsis of FEC data to find how much each PAC had spent so far in each race and then manually went through and coded each race whether the outcome was in line with or against the PAC’s interest. Then we added everything up.

Visualizing it 

This idea went through a few iterations before settling on what you see above. For a while, we’ve been wanting to use a tower graphic template - one of those vertical scroll layouts with a sticky table of contents - that I built a couple of months ago but it never seems to work out. This time, after thinking about all of the detail we wanted to display we thought bigger.

If you’re trying to visualize money flows, Sankey lines are a go-to. ProPublica did a great one showing overlapping Super PAC expenditures and you see them as flat graphics too. They show direcionality and volume = great for money.

Getting the right data

Money was flowing from donors to PACs and then to races, so we used the JSON structure that D3 lays out for its network layouts (and Sankey) visualizations. You have a list of nodes (People and PACs) and a list of node to node links (X person gave $Y to Z PAC). We were working collaboratively in Google Docs so were able to do some formulas that would print out or data structure in JSON as we were editing the document. Very handy in case you need to correct any numbers or name spellings.

Our D3 visualization was a failure.Sankey Fail

Here’s a link to the interactive version (yes, it’s in the “failures” folder). As you can see, there were too many races to fit on the screen and the dollar amounts in some races were so high that they dwarfed everything else. So showing each race in the Sankey was out. 

This led to Sankey Idea #2.

Sankey Sketch

We connected photos of the donors to the PACs, showed the percentage of succesful funds, and then put the races in a table down below. The photos were very useful because you can quickly understand that money is coming from a person and going somewhere. If we just had text, I think, without photos, it would be less clear and have less of a personality. Someone remarked that the lines almost form bodies and arms that reach out to touch Super PACs. It’s interesting to see visualized data combined with photography work out to tell a story like that.

Under the hood

We used Raphael to draw the lines, which was an improvement from D3 since we do indeed support Internet Explorer 8. We tweaked Al Shaw’s Sankey line from Tom Counsell’s Sankey library to make them span vertically instead of left to right. Here’s a jsFiddle of the code to draw the line.

The table uses Isotope.js for its animated sorting, which is snazzy but I also think does help make tabular data more understandable. Instead of clicking on a column header and everything resorts in a flash, you can see how dramatically different rows vary from view to view. It’s also nice because you can do filtering. So without much code you have a filterable, sortable table. It also saves a step of turning the object data into arrays for sorting. I’ve been wanting to add those ascending / descending arrows for a while to our tables so this was a good time for that.

This table will probably become our first stand-alone NewsBeast Labs plugin since we’ve been using it pretty frequently. That’s pretty cool because five months ago we didn’t have any interactive news code and now we’ve done enough projects that we can see what’s worked, what functionality we like and can wrap it all up into something more robust and reusable, which will make our future development that much faster. 

-michael

A couple of weeks ago we had a small project that uses a stack we’ve come to like. The project was an interactive collage displaying just under a hundred books that take a somewhat negative view of the president, to put it lightly. With so many books we wanted a way for people to filter the list to see an amount that was more easily digestible. Matthew DeLuca separated them out into helpful categories like “Economy” and “Dangerous Radical”. You can hover over them to see an excerpt if you’re so inclined. We hope you like it.
Under the hood
We used the Isotope library, which we’ve used before in both similar layouts and with sortable tables, to handle the animation and filtering. We used the Miso Dataset library (more on that in a forthcoming post) to assemble the data in a spreadsheet so multiple people could work on it simultaneously and editors could easily access the copy. When everything was ready we downloaded it to a local csv and went from there. It’s a workflow that has worked well on projects like these.

-michael

A couple of weeks ago we had a small project that uses a stack we’ve come to like. The project was an interactive collage displaying just under a hundred books that take a somewhat negative view of the president, to put it lightly. With so many books we wanted a way for people to filter the list to see an amount that was more easily digestible. Matthew DeLuca separated them out into helpful categories like “Economy” and “Dangerous Radical”. You can hover over them to see an excerpt if you’re so inclined. We hope you like it.

Under the hood

We used the Isotope library, which we’ve used before in both similar layouts and with sortable tables, to handle the animation and filtering. We used the Miso Dataset library (more on that in a forthcoming post) to assemble the data in a spreadsheet so multiple people could work on it simultaneously and editors could easily access the copy. When everything was ready we downloaded it to a local csv and went from there. It’s a workflow that has worked well on projects like these.

-michael

Notes and images from an ever-growing digital newsroom.

Newsweek & The Daily Beast

Contributors:
Brian Ries & Sam Schlinkert

Formerly:
Michael Keller, Andrew Sprouse, Lynn Maharas, & Clarisa Diaz

view archive