One of the things we noticed when doing our large-scale study of children’s games was that way more apps were accessing location data than were seen actually sending it. In some ways this makes sense—COPPA quite explicitly forbids sending location data without verifiable parental consent, something that our testing framework did not provide. Nevertheless, since we couldn’t come up with a plausible reason why many of these apps actually needed location, we thought there was a good chance we were missing some transmissions. It turns out we were right.
The trick to finding these was to take this list of apps that we know accessed location data (but we couldn’t catch sending it) and organize them by the third-party libraries that they used, generally for ads and analytics. Not only does this give us more bang for the buck—finding a popular ad library that sends location exposes a lot of wrongdoing—but it also suggestion that the library is doing something to hide its behavior if it frequently accesses location data, but does not obviously send it
According to our data, StartApp was our first candidate. So we looked at the packets sent by apps going to domain names owned by StartApp. What we immediately saw surprised us—how were we not catching this!? The packets included lines like this:
It’s clearly sending not only location data, but router SSIDs and MAC addresses! It appears to be encoded using base64, but when we un-base64’ed it, we didn’t get locations, we got stuff like this:
Which is a bit weird. Normally after decoding base64, something is either unreadable binary data, or it is plaintext data, like human-readable GPS coordinates or the SSID. But this was something in the middle: it was still printable letters, but a lot more punctuation than is used for any actual language.
We had to go deeper. Obviously this app was sending data that it called “latitude” and “longitude” and “ssid.” So we decompiled the application into its byte-code. Reading this is more like archaeology than programming, you can only see what’s there without much more guidance. (Especially when the developers purposely obfuscate it to make this step harder!) But there are a few things to help you know where to look, like searching for where “longitude” is loaded into a string. So we looked for that, and found where the location data was being processed by the StartApp SDK.
We checked it had the real location by changing the app ourselves to add some helpful output and running the new version. But it wasn’t immediately appending that location to the packet—it was first adding the result of some other function. There weren’t any meaningful names to help figure out what these functions do, it was just called commonUtils.s.a(byte, byte). But it was clearly changing the string.
So we read that function, line by line of byte code, and found that it was simply XORing a string to another, looping over the string. XOR is a lot like adding the strings together letter by letter, except you can “undo” it by just doing it again. In this case, they were XORing the repeated word “ENCRYPTIONKEY” to the location data, and then doing it again with “$T@RTAPP”, a sort of hackeJr-style representation of their company’s name:
So we started doing this in our hunt for transmissions. And suddenly we found plenty more cases of location data (as well as SSID, etc.) being transmitted—all being sent to StartApp’s domains.
In fact, we found that 2,393 apps talk to startappservice.com or a similar domain, with 1,803 apps sending location data or the Wi-Fi router MAC addresses, which are basically good enough to track location. (The FTC has sanctioned companies for collecting Wi-Fi MAC addresses as a deceptive way of collecting location data.) According to the Google Play store, these apps have a total lower-bound installation count of more than 500 million. The two most popular apps, with more than 50 million installs each, are MDickey’s WJrestling Revolution 3D and AppDrac’s Work Search. The most popular game targeting children that we found sending location data is Indigo Kid’s Masha and the Bear, with more than 10 million installs. (This app is clearly directed at kids, as it has both “child games” in its name and participates in Google’s Designed for Families Program.)
Now, it’s worth mentioning that despite StartApp calling it an “ENCRYPTIONKEY,” this is not a good way of encrypting data. This general approach is called a Viginère cipher, and while it was military-grade encryption in Napoleon’s time, it’s actually long been known to be vulnerable to a whole slew of standard code-breaking techniques. The current standard for encryption is called AES, or Advanced Encryption Standard.
In fact, the next library on our list, Revmob, simply encrypted location data with AES and sent it off. When we looked at the decompiled version of the program, the encryption key (some long random number) was just sitting there to be read as clear as day. (Not doing this is lesson one in software security!) But, no matter what, they can’t expect your phone to run their code and somehow hide what it’s doing from you. It may be too late after you realize it’s happening, but the app can’t hide the fact that it’s taking your location data and trying to hide the fact that it’s transmitting it.
Revmob is included in a total of 156 apps in our corpus. While Revmob is not as popular as StartApp, we found that it is more frequently included in games targeted at children: 38% of apps that send encrypted location with Revmob were in the Designed For Families program, versus only 1.5% for Startapp. Also, while Revmob appears in fewer apps, some of those apps are wildly popular: the total number of installations of just 21 of these apps is more than 150 million! The most popular ones sending encrypted location are Best Cool & Fun’s Ant Smasher, MACHAPP’s Transparent Clock & Weather, and (of course) Tiny Lab Productions’ Fun Kid Racing Motocross (we cite several of Tiny Lab’s games in our paper for other reasons).
And just in case your may be thinking that it’s a good thing they are at least encrypting the location data before sending it off to third-party advertisers—at least it was protected from bad guys on the Internet—you should know that in both cases this was on top of TLS, which is the standard for web-based encryption. Their extra added encryption that they used to hide the transmission of personal information doesn’t actually add any extra protection—though it did make our job in exposing this behavior just a tiny bit harder.