Is this a return of the US-EU privacy wars of the 1990s, when Brussels’ bureaucrats threatened to halt intercontinental online transactions? They may be coming back. This time it’s over location data.
An upcoming EU report will say that “geo-location data has to be considered as personal data… The rules on personal data apply,” an EU official tells the Wall Street Journal.
The implication is that data collected by cellphones, twitter, Facebook and others must be handled like names, birth dates, and other personal information: requiring user consent, deletion after a certain period, and kept anonymously.
This is absolutely preposterous. Yes, rules — better, tougher rules — are desperately needed. But to simply drop the data into a pre-existing regulatory bucket (as the EU is doing, of calling it “personal information” which has sweeping regulatory burdens) is asinine. It will hold back the amazing innovations and services that are just starting to emerge, and future ones that we can scarcely imagine today.
Calling something new (geo-location data) something old (personal identifiable information, or PII in the trade) is a far too blunt way to go about upholding legitimate public interest concerns that need to be addressed. It avoids the more humble — and probably more effective — task of trying to figure out the new properties of this type of data, and thus devise appropriate ways to balance personal privacy with innovative services. It’s harder to do this, but sounder.
This of course will happen, but over time, and probably in a different regulatory jurisdiction. Possibly America? Perhaps China? Maybe Brazil? But European geo-loco firms will suffer in the meantime, since they’ll crammed into a regulatory straitjacket. And to be clear: this is not to say that better rules aren’t needed — they definitely are. But they ought be sensible ones.
Failing to take a more cautious and reflective regulatory approach results in things like the EU’s 1998 privacy directive. It did an excellent job of getting governments into the privacy arena, but it had lots of silly parts too. For one thing, it required an international “safe-harbor” provision in order to do innocuous things like allowing a US firm in France to send its payroll data to headquarters in Detroit. The rules are already out of date, and although it boasts strong enforcement provisions, they’ve barely ever been used.
In fairness, the US has miserable privacy legislation — no country does it well — but the piecemeal approach and building up of a body of regulatory experience is looking like a better way forward. There is no “privacy kommissar” in America, but that hasn’t stopped the FTC from taking serious action often.
A far better way to proceed is the way the US is moving. Sen. Al Franken’s opening statement to hearings on May 10th on cellphone privacy was a paragon of wise policymaking: he wants to find the right balance. He was scorching in his condemnation of current practices:
“Once the maker of a mobile app, a company like Apple or Google, or even your wireless company gets your location information … these companies are free to disclose your location information and other sensitive information to almost anyone they please-without letting you know. And then the companies they share your information with can share and sell it to yet others-again, without letting you know. This is a problem. It’s a serious problem.”
But at the same time, he understood the risks of regulating too soon:
“I just want to be clear that the answer to this problem is not ending location-based services. No one up here wants to stop Apple or Google from producing their products or doing the incredible things that you do. You guys are brilliant. When people think of the word “brilliant” they think of the people that founded and run your companies.”
If this gap in regulatory approach is not settled, the result may well be another round of the privacy wars. Companies like Apple, Google, Facebook, Twitter, Foursquare and others will have to tailor their operations depending on jurisdiction, down to their very code base. The EU will argue that they have to do this any way for language and law. But this still fractures and debilitates the service. And it is hypocritical: the idea behind the EU’s common market and common currency is about the gains from harmonization.
The best way to ward off bad public policy are good case studies of excellent services. Industry basically has its head in the sand hoping this issue will go away (it won’t) or is in hiding, hoping it doesn’t need to disclose how the services work (it does). Their actions are shortsighted. Geo-location services are interesting and useful, and if people really knew what was happening, many would be fine with it, provided a backstop of basic protections exist.
The case must be made publicly. So what are the amazing new services that are emerging that show why the EU’s approach is not quite right? Share your stories here.
One of the most impressive trends over the past decade (and broadly, the past century) has been the rise of the NGO. In the 1990s they mushroomed like start-ups and attracted “social entrepreneurs.” The bigger shift today is that it’s no longer a person’s full-time job: now actual entrepreneurs toiling at start-ups have their own philanthropic gig on the side. A computer went from a 2-ton, $2 million, room-sized machine to a pocket-sized thing. So did non-profit organizations.
I recently scribbled a few thoughts about the data dimensions of responding to Japan’s crisis for The Economist’s website: “The information equation” on April 24th. I was impressed that a private-sector company was playing the role that a governmental organization or NGO might play. (It’s a Google.org project, to be exact.)
Among the things I learned was that Google collected $5.5 million in donations through its crisis-response page. A small but not insignificant haul. But it got me thinking. The world of BigData is about learning new things from information that is otherwise invisible to the naked eye. What could the donation data tell us about how to more effectively solicit charitable contributions? Specifically, as I wrote in the penultimate paragraph of the article:
The donation data may offer a chance to learn new things about how people contribute. For example, what is the average amount? Does it follow a standard normal deviation (ie, a “bell curve”) in which a few give a little and a lot, with the majority donating around $15? Or is it a power-law distribution, in which there are two or three extremely rich donors, a handful of generous ones, followed by a long tail of $2 contributions? Did they donate using PayPal or credit cards? What time of day do people give? Is it after they have read a news story or clicked a link within an e-mail? The information would help fundraisers tailor how to make their appeals. And the data can be broken down by country or even city via Internet Protocol addresses.
I’ve asked Google’s hyper-helpful PR team to run the idea past their number-crunchers, to get access to the findings so I can write a story about this. It’s sort of like Google Flu Trends, but for charities. It would be highly valuable information for NGOs to know — particularly one that is dear to my heart, International Bridges to Justice (where I proudly serve on the board).
What is the world of BigData? To define it is to limit it. But here’s an entry in: it refers to things that we cannot see with the naked eye, but is only revealed to us from a huge body of data. So, for example, we may not know what people think the best restaurant is in terms of food and service and atmosphere and a good time. But being able to see the percentage left as tips could be a way of learning this, in a way we couldn’t know in the past — when we couldn’t get the data or process it.
This is a fictitious example (and one imagined by Mike Driscoll of Metamarkets). But it helps to concentrate the mind on what is new about the BigData revolution taking place, and how information cleverly reused can create new sources of economic value. And this leads me to thinking about the Wall Street Journal’s excellent article “The Really Smart Phone” by Robert Lee Hotz, which is part of the paper’s impressive series “What They Know.”
There’s much to praise in the piece. Instead, I want to put forward some vital distinctions that industry needs to consider, when thinking about some of the trends happening.
First, we need to separate the process of BigData from its output. The article — like the industry — doesn’t really do this. For example, sometimes we talk about being able to track 100 million cellphone users (but don’t note the substance of what is being tracked: calls? location? bills?) And sometimes we talk about what we learn, such as a person’s susceptibility to obesity. But is it because location data shows they’ve been sedentary? Or because they bought lots of ice-cream from their iPhone?
These distinctions are crucial. In one instance, it is anonymized metadata, in another it is individual information. The ways that entities are allowed use these different types of data perhaps ought be different too.
Most people, and most articles in the press, approach the BigData issue from the negative: “if you only knew what they know about you!” But I believe that industry ought be far more transparent because if people did know, they’d probably be more impressed than alarmed. (It is a point that I made in my special report “The data deluge” in The Economist last year.) Specifically, what is so unsettling: what they collect? Or what they know? On the surface, we bristle at both. But when we look deeper, it gets fascinating to see what new things can be learned from a big body of data.
Hence, the transition from “Ick!” to “Wow!” But I think the failure of industry to be open about its practices will hold it back. Thus: the public will cry: “Wait!”
People are antsy. Regulators are uneasy. And business is barricading itself. Amazon never discusses it. Google does — and thus invites abuse, alas. Apple is characteristically silent. “Google Inc. defended the way it collects location data from Android phones, while Apple Inc. remained silent for a third day,” the WSJ wrote in a separate article in April 23.
I think with the right outreach, BigData firms can make the case for collecting and processing the information. It will change the debate to the more essential questions: who owns the information, who gets to benefit from it, how it is valued, how it is protected and what are the penalties if this trust is abused?
This further establishes the distinction I identified at the outset: separating the process from the substance. In fact, we are talking about so many records — millions of people, zillions of data-points of locations or calls — that the practical effect seems to be anonymization, even if is not done effectively in practice.
To overcome the “Wait!,” I’d urge that we make rules. As for a starting point to think about them, I’ll discuss another time. For now, a look at what the WSJ piece did a nice job of highlighting. The large numbers associated with the research was interesting, but actually unimportant — they’re just big numbers; the process. The actual output of what is to be learned is far most interesting. Specifically, cellphone BigData lets us:
– pinpoint “influencers,” the people most likely to make others change their minds
– forecast where people are likely to be at any given time in the future
– predict which people are most likely to defect to other cellphone carriers
– reveal subtle symptoms of mental illness
– foretell movements in the Dow Jones Industrial Average,
– chart the spread of political ideas as they move through a community
– expose a cultural split that is driving a historic political crisis (in Belgium)
– deduce that two people were talking about politics
– detect flu symptoms before the students themselves realized they were getting sick
A final note: all of these insights were gleaned by parsing two types of data: location and interconnections among users — metadata. There is a lot more mobile data to collect; we’ve barely scratched the surface. Also, nota bene that none of the data relates to specific content from the phone or user. And it is not clear that the data collected can be traced back to a specific user, other than in cases of academic research in which consent was granted.
In some ways, the data collected looks like the “pen register” information that is less spooky than an outright wiretap: eg, who calls whom and when, but not what they said. It has a lower standard for law enforcement to obtain.
My point is this: the BigData issues we’re confronting now are the easy ones. So this is the moment to start thinking about seriously debating them, and arriving at answers — as a precursor to the harder issues coming down the pike.
I had a delightful dinner with a Canadian (Tory) politician this evening. Before we got around to talking about gun-control and “the situation in Freedonia,” the conversation fell upon data. “It probably sounds like the boring-est thing in the world,” I explained, “but it’s not: it is the most exciting, and arguably the most important in your lifetime.”
As I said this, his look changed from one of polite indifference (and mild pity), to genuine curiosity. Now, I had to explain myself. What I blurted out what this:
The world is becoming data-ized as digital information and numerical measurement is being applied to all aspects of what people do, particularly things that couldn’t be measured before because it was impractical or impossible. (Think: using wireless and GPS in cars to base insurance premiums on where and when people actually drive, as has been possible since 2007.)
The impact will be as profound as the scientific method in the 18th century — which quickly moved past the sciences and left its mark on all areas of human endeavor. For instance, what is “quantitative decision making” in management, if not the scientific method applied to business…. Likewise, the BigData revolution is plowing through the sciences, and also jumped into mainstream areas, such as business and government.
My dinner companion got it.
It is not easy to describe what is happening and why it matters. But parallels with the past, albeit imperfect, are usually very useful. Hence, trotting out the scientific method to explain the here and now.
One possibility comes to mind: perhaps shopping patterns are so statistically consistent, routine and as personal as DNA, that information about a person’s previous purchases — or even non-shopping activities — enables an algorithm to know if the customer is truly the person he or she says.
That is interesting. But, alas, the report looks at more prosaic things, ie:
PayPal, Amazon, and Google have all developed sophisticated analytical tools and infrastructure to identify patterns of fraudulent activity. Paypal, for example, has a series of Fraud Management Filters that screen payments and sort out transactions that warrant review because of their amount, their origin, or other factors that can be set by a merchant. […] PayPal and Amazon have developed fraud detection tools that depend on massive datasets containing not only financial details for transactions, but IP addresses, browser information, and other technical data that will help these companies refine models to predict, identify, and prevent fraudulent activity. PayPal and Amazon have had years to amass databases of the transaction details for hundreds of millions of customers across thousands of merchants.
The sort of filtering and checking described above (bold emphasis mine) involves no conceptual shift in how to use data. All that is being described is doing the same intutive techniques that one would have long done in a world of “small data.” The only thing “big” about it is that there’s a lot more data to sift through. But the firms are not using the size and depth of data to do anything novel per se.
This is a pity. The revolution that is taking place in other dimensions of the Internet industry is that companies can do entirely new things with a big data set that they cannot do with a small one. A former top Google executive once told me that Google Checkout was created in part because the firm realized that learning about a customer’s shopping pattern could better detect fraud, which is the key e-commerce stumbling block.
Likewise, at the O’Reilly Strata conference in February, hallway chit chat was about how a financial services firm might be able to more accurately predict whether someone will repay a loan using Facebook’s social graph than a FICO score, since best predictor if person will repay is if their friends repay their loans. (Actually, the example was told to me as if it were already being done, though not with Facebook’s data). Yet I think I’m safer considering it apocryphal until I hear it first hand.
Does anyone know of incredible stories of how “big data” is being used in new ways to reduce financial fraud? If so, comment here or email me directly.
I have been too distracted to get a weblog. My friends Meg Grant and Mike Hambleton told me to start one back in the fall of 2000 — believe it or not — but I just couldn’t see the point. Then, I got too busy with work.
After my wife Heather chastized me for being such a technical fuddy-duddy — as well as not linking to her site (heatherhopkins.wordpress.com) I decided to take fast action. As well as secure the third-level domain. And to actually be a part of what Mike Wesch and Yochai Benkler talk about (and, um, I write about…).
So here I am. Mainly to see just how easy it is to set up. Creating this took under four minutes. Perhaps this Internet-y thing isn’t just a fad after all.