Today we published a data story looking at how iOS devices fail to accurately correct some words such as “abortion” and “rape.” Here’s a detailed methodology on how we did that analysis.
It started back in January when we were working on our project mapping access to abortion clinics. The reporters on the project, Allison Yarrow and myself (Michael Keller) were emailing a lot about the project, which led to us typing the word “abortion” into our phones on a fairly regular basis. We noticed that iOS never autocorrected this word when we misspelled it, and when we would double-tap the word to get spelling suggestions, the correctly spelled word was never an option. We decided to look further into whether this could be repeated on iPhones with factory settings and what other words iOS doesn’t accurately correct. To do this, we decided to find a complete list of words that iOS software doesn’t accurately correct.
We did this in two stages:
Stage One: Use the iOS API’s built in spell-checker to test a list of misspelled words programmatically.
Step 1: Get a list of all the words in the English language
We combined two dictionaries for this: The built-in Mac OS X dictionary that can be found in /usr/share/dict on a Mac and the Wordnet corpus, a widely-used corpus of linguistic information, which we accessed through NLTK, a natural language processing library for Python. We left out words shorter than three characters, words in the corpus that were two words (e.g. “adrenal gland”), and words with punctuation such as dashes or periods (e.g. “after-shave”, “a.d.”). We reasoned that these words were either too short to accurately correct or had more variables to them than we would be able to test on an even playing field, so we left them out of our analysis.
Step 2: Create misspellings of these words
We wanted to test slightly misspelled versions of every word in the English language so, to start, we wrote a script that produced three misspellings of each one: one where last character was replaced with the character to the left of it on the keyboard, one where the last character was replaced with the character to the right of it on the keyboard, and a third one where the last character was replaced with a “q”. Because modern spellcheck systems know about keyboard layout, these adjacent-character misspellings should be the low-hanging fruit of corrections.
For instance, “gopher” would become “gophet,” “gophee,” and “gopheq”.
Step 3: Run these misspelled words through an iOS API spellchecker program.
Apple doesn’t have a “spellcheck program” but for iOS developers, it has an API with a function that will take in a misspelled word and return a list of suggested words in the order of how likely it thinks each suggestion is. In Xcode, the program you use to write iPhone and iPad Apps, you can use a function under the UITextChecker class, called “guessesForWordRange” which will do just that. Before testing each word, however, we ran the non-misspelled word through a function in this class called “rangeOfMisspelledWordInString” which will tell you whether the word in question exists in the iOS dictionary. This meant that we weeded out words that were in our Wordnet and Mac dictionary lists but that iOS wasn’t aware of. In other words, we only tested words that if you spelled them correctly on an iOS device, they wouldn’t get the red underline. For all of our tests we used the then-most up-to-date version of Xcode, 4.6.2, and ran the most up-to-date version of the iOS 6 Simulator.
We also tested whether the misspelled word was in the dictionary and to make sure our misspelled word wasn’t also a real word. For example, “tab” has a right-adjacency misspelling of “tan” which is also a word. In that case, the script fell back to the “q”-misspelling. So if it was testing “tan” as a mispelling for “tab” it would see that “tab” is a real world and throw “taq” at it as the misspelling. Obviously, “taq” is a harder misspelling of “tab” to correct, but we also gave it “tav”, its left adjacency misspelling. If it got either of these right we would count “tab” as a word that it can accurately correct. Later on we did many more misspelling combinations as our list got smaller to be sure we gave the spellchecker many chances to correct what should be easy corrections.
Step 4: Analyze results
If a word was accurately corrected at least once, we marked it as properly recognized by iOS. This process narrowed our list down from about 250,000 to roughly 20,000 words. There was one big problem though: the iOS spellcheck didn’t accurately correct some words that real iPhones were able to correct. For instance, the API wouldn’t correct “aruguls” to “arugula,” for some reason. Our questions to Apple on this went unanswered; if anyone has any suggestion as to why the two systems are different, please let us know.
After meeting with some New York-area iOS developer meetup groups, we found that the spellcheck on the iOS simulator as a part of Xcode does correct these edge cases, which led us to stage two.
Stage Two: Use spellcheck on the iOS simulator to check the remaining 20,000 words
To access the word suggestions on the iOS simulator, you need one crucial piece of hardware: a human hand. We were able to write an iOS program easily enough that presents a word on the simulator, but there’s no way to programmatically pull up the spellcheck suggestion menu because iOS programs don’t have scope for system level operations. To do that, you need to physically double-click the word and navigate through the various menus.
Step 1: Find a way to automate clicking
To solve this, we got into our wayback machine and wrote an AppleScript that would move the mouse to specific coordinates on the screen, wait a specified number of milliseconds for menus to appear and then click in the appropriate places. Our iOS program had a button that, when clicked, saved the original word, the presented misspelled word, and the final result of the correction. Our AppleScript script clicked through the menus, replaced the word if the simulator presented a suggestion, then clicked the button to serve the next word.
We tried to make this process as fast as possible but it ended up taking around 1.6 seconds per word. 1.6 multiplied by 20,000 is 32,000 seconds, which equals 8.8 hours. But we also wanted to present even more misspelling options—twelve more in total.
We can call this Step 2, create more misspellings:
1. Double last character.
2. Double last character with a capitalized first character.
3. Missing last character.
4. Missing last character with a capitalized first character.
5. Misspelled first character (via left misspelling adjacency) and capitalized first character.
6. Misspelled first character (via left misspelling adjacency).
7. Misspelled first character (via right misspelling adjacency) and capitalized first character.
8. Misspelled first character (via right misspelling adjacency).
9. Misspelled second character (via left misspelling adjacency) and capitalized first character.
10. Misspelled second character (via left misspelling adjacency).
11. Misspelled second character (via right misspelling adjacency) and capitalized first character.
12. Misspelled second character (via right misspelling adjacency).
So, including our first misspelled last character with left/right adjacencies, we had 14 lists of 20,000 words to run through. 14 multiplied by 8.8 hours = 123.2 hours, which is five days if the program ran straight for 24 hours a day. We needed to take a break in between each of the 14 sessions, however, and restart Xcode just in case there was a learning algorithm—we didn’t want the results of one session to pollute another.
Renting computers from Amazon is easy and but not if they’re Mac OS computers, which aren’t available through Amazon and get rather expensive through other dealers. Fortunately, the Columbia School of Journalism let us take over one of their Mac computer labs and we were able to run script in parallel and finished in a much more reasonable time frame. I was also able to not have my laptop out of commission crunching words for a week. Here’s a Vine of what the automated corrections looked like:
One drawback of this method was that we could only get the mouse simulator to select the first suggestion. So, in the scenario that for the misspelled word “abortiom”, “aborted” was suggested as more likely than “abortion,” this program would make that as an inaccurate correction. Wearen’t too worried about this, though, because 1) our iOS script in stage one *did* take into account multiple suggestions, so all the words had two chances to be corrected in that scenario, and 2) we presented 14 different misspellings of these words and if any one of these variations was correctly spelled then we counted that accurately corrected. If a word that is only off by one character isn’t suggested that many times, then something in the algorithm isn’t handling that word correctly.
Step 3: Analyze results
This second stage only cut out around 6,000 words, leaving us with 14,000 words that were never accurately corrected. The ++related article++ lays out our findings but our initial hypothesis that “abortion” is a word that iOS doesn’t correct, unlike Android phones, held true. Apple declined to comment for this project so we have many unanswered questions. One idea for future research is whether iOS devices are incapable of learning certain words like “abortion.” That is to say, these words are blocked not just on the dictionary suggestion level, but on the machine learning level as well.
Stage Zero: Find the files.
Before we did stage 1 we had a different strategy: find this list of seemingly banned somewhere in the iOS file structure. To do this, we put out a call on Facebook for any friends that would donate an old iPhone to be jailbroken. We got three phone: one from my mom, and two from some very nice old friends who mailed them to our offices. We factory-reset and jailbroke one and kept the others as factory-fresh for testing. We went searching and found some promising files in the LinguisticData directory called “pos” “ner” and “lemmas” which in the natural language processing world, stand for “part of speech”, “named entity recognition” and “lemmatization,” which is analyzing word stems and inflected forms like “better” being associated with “good” as its base. These files were unreadable, however, because they weren’t in any known format. The only way we could read them was in their raw binary-hex format which looks like that terrible mess of characters you see when you open a corrupted word document—like Wingdings but with less rhyme or reason.
After many attempts at deciphering where a list of blocked words could reside and reaching out to the New York iOS community, we started in earnest with reverse engineering this list ourselves with stage 1.