What is the world of BigData? To define it is to limit it. But here’s an entry in: it refers to things that we cannot see with the naked eye, but is only revealed to us from a huge body of data. So, for example, we may not know what people think the best restaurant is in terms of food and service and atmosphere and a good time. But being able to see the percentage left as tips could be a way of learning this, in a way we couldn’t know in the past — when we couldn’t get the data or process it.
This is a fictitious example (and one imagined by Mike Driscoll of Metamarkets). But it helps to concentrate the mind on what is new about the BigData revolution taking place, and how information cleverly reused can create new sources of economic value. And this leads me to thinking about the Wall Street Journal’s excellent article “The Really Smart Phone” by Robert Lee Hotz, which is part of the paper’s impressive series “What They Know.”
There’s much to praise in the piece. Instead, I want to put forward some vital distinctions that industry needs to consider, when thinking about some of the trends happening.
First, we need to separate the process of BigData from its output. The article — like the industry — doesn’t really do this. For example, sometimes we talk about being able to track 100 million cellphone users (but don’t note the substance of what is being tracked: calls? location? bills?) And sometimes we talk about what we learn, such as a person’s susceptibility to obesity. But is it because location data shows they’ve been sedentary? Or because they bought lots of ice-cream from their iPhone?
These distinctions are crucial. In one instance, it is anonymized metadata, in another it is individual information. The ways that entities are allowed use these different types of data perhaps ought be different too.
Most people, and most articles in the press, approach the BigData issue from the negative: “if you only knew what they know about you!” But I believe that industry ought be far more transparent because if people did know, they’d probably be more impressed than alarmed. (It is a point that I made in my special report “The data deluge” in The Economist last year.) Specifically, what is so unsettling: what they collect? Or what they know? On the surface, we bristle at both. But when we look deeper, it gets fascinating to see what new things can be learned from a big body of data.
Hence, the transition from “Ick!” to “Wow!” But I think the failure of industry to be open about its practices will hold it back. Thus: the public will cry: “Wait!”
People are antsy. Regulators are uneasy. And business is barricading itself. Amazon never discusses it. Google does — and thus invites abuse, alas. Apple is characteristically silent. “Google Inc. defended the way it collects location data from Android phones, while Apple Inc. remained silent for a third day,” the WSJ wrote in a separate article in April 23.
I think with the right outreach, BigData firms can make the case for collecting and processing the information. It will change the debate to the more essential questions: who owns the information, who gets to benefit from it, how it is valued, how it is protected and what are the penalties if this trust is abused?
This further establishes the distinction I identified at the outset: separating the process from the substance. In fact, we are talking about so many records — millions of people, zillions of data-points of locations or calls — that the practical effect seems to be anonymization, even if is not done effectively in practice.
To overcome the “Wait!,” I’d urge that we make rules. As for a starting point to think about them, I’ll discuss another time. For now, a look at what the WSJ piece did a nice job of highlighting. The large numbers associated with the research was interesting, but actually unimportant — they’re just big numbers; the process. The actual output of what is to be learned is far most interesting. Specifically, cellphone BigData lets us:
– pinpoint “influencers,” the people most likely to make others change their minds
– forecast where people are likely to be at any given time in the future
– predict which people are most likely to defect to other cellphone carriers
– reveal subtle symptoms of mental illness
– foretell movements in the Dow Jones Industrial Average,
– chart the spread of political ideas as they move through a community
– expose a cultural split that is driving a historic political crisis (in Belgium)
– deduce that two people were talking about politics
– detect flu symptoms before the students themselves realized they were getting sick
A final note: all of these insights were gleaned by parsing two types of data: location and interconnections among users — metadata. There is a lot more mobile data to collect; we’ve barely scratched the surface. Also, nota bene that none of the data relates to specific content from the phone or user. And it is not clear that the data collected can be traced back to a specific user, other than in cases of academic research in which consent was granted.
In some ways, the data collected looks like the “pen register” information that is less spooky than an outright wiretap: eg, who calls whom and when, but not what they said. It has a lower standard for law enforcement to obtain.
My point is this: the BigData issues we’re confronting now are the easy ones. So this is the moment to start thinking about seriously debating them, and arriving at answers — as a precursor to the harder issues coming down the pike.