Data in antiquity — and the info all around us

December 29, 2011 Leave a comment

The always insightful Pete Warden recently penned a blog post on “What the Sumerians can teach us about data.” There is much to praise and react to in his analysis. But I’m struck in particular by a semantic matter: does Pete really mean “data” or “information”? I usually hate this genre of challenge; it’s the most tedious in our business. But this time it deserves to be raised.

The reason is that the idea of quantification is really a phenomenon of the Middle Ages in Europe (laying to rest the old canard that they were “dark ages” devoid of progress). On the other hand, the period of antiquity is typified by man describing his world as one of qualities. (Remember Socrates’s “forms?” And Aristotle’s taxonomy on just about everything?)

To be sure, in the area of money we can talk about quantification and thus data as we think about it today. But in many of Pete’s terrific examples of how the Sumerians recorded their world — in the “fixed media” of clay tablets and the like — I am unsure if the term data fits.

Ought “writing” be considered data? If so, how about caveman paintings? Surely the Egyptian hieroglyphs imparted information — but should we call it “data” per se? The only way to answer that question is to define data.

The word data is the plural of datum, neuter past participle of the Latin dare, “to give”, hence “something given,” instructs Wikipedia. “1. Facts and statistics collected together for reference or analysis. 2. The quantities, characters, or symbols on which operations are performed by a computer, being stored and transmitted in the form of…” reports a Google definition.

Building on the idea that data may be something different than just recording information, at what point does something go from being simply info to data?

I have a few ideas on how to answer this — I am scribbling away on a large work that looks at this topic among others. But I’m not quite ready to share it with the world, since the thoughts are still fermenting. In the meantime, Pete’s post is a wonderful look at how an early society recorded and used information. Among my favorite points:

* “Written records remove the problem of fallible memories, but replaces it with a second-degree question of provenance. How do you know the data accurately reflects what happened?”

* “We still have a disturbing tendency to trust anything that’s recorded, without understanding the subjective process that went into creating the record.”

* “The main way Sumerians protected the integrity of their data was through curses. This may seem laughable to a modern audience, but I don’t think we’re so different. Do you expect the FBI to actually raid your house if you copy that VHS tape?”

* “In the absence of real answers, we’ll take bogus ones painted with a veneer of data, just like the Sumerians.

* “If there’s any way you can, please think about how to open up data you control, it’s the best way to pass it on to posterity.”

Having pointed out what I enjoyed most, let me close on a final quibble. Pete writes:

“The Sumerians recorded everything on stone or clay tablets … This data exhaust gives a rich view into trade, worship, life, death, medicine and almost every other aspect of the Sumerian’s world.”

It is absolutely not “data exhaust” in the way that the term has come to be known (and how I helped popularize it in a report a few years ago). The idea was information provided as a byproduct of interacting with information that itself could be collected and analyzed. The simplest example is tracking readers activities to reveal to website visitors the most-read articles, as a simple heuristic to indicate what might interest them.

What Pete describes, and what the Sumerians recorded, was information (or perhaps data) pure and simple. No “exhaust” about it — other than that the tablets had been thrown away by the Sumerians before modern archeologists dug them up.

But all this ranting is only meant to add momentum to my appreciation for Pete’s splendid work in this post and others!

From insanity to inanity

June 12, 2011 1 comment

How to craft rules in a BigData world for information access? It is a hard question. But how not to is far clearer.

According to a new US government policy, lawyers representing Guantanamo prisoners are allowed to read Wikileaks’ classified US documents — but not print or save them. The actual policy “guidance” is here (from Politico) and an analysis by Politico’s Josh Gerstein is here.

Are the US officials that devised this policy out of their minds? How could anyone rationally adopt such an inherently inconsistent policy?

If the lawyers cannot read the material, they are blocked from accessing pertinent information that is already in the public domain, which could help them prepare a defense. Allowing access is only sensible. To do otherwise would be to deny reality (that the material is widely available), and might deny justice too.

However, crippling that access by placing arbitrary restrictions on its use make no sense whatsoever. Why? On what basis is one allowed to read but not print or save? Surely the US does not mean for the frailty of a person’s memory to govern how material is put to use. But that is the policy’s effect.

The irony is that the current policy is actually a slightly more rational shift from previous rules that forbid any access at all. It underscores the fact that the government has no clue how to respond to the new world we’re in regarding BigData leaks.

And it is a longstanding problem. Just this month, the US officialy released the trove of documents known as the Pentagon Papers — 40 years after they appeared in the New York Times. (The AP’s story is here) The Economist, in an article last month about it (“The open society and its ostriches“) argued that the way to think about these cases is that “the illegal disclosures in effect declassify the information.”

When the contradiction between futile policies and the reality on the ground grow so wide as to be preposterous — as it is now — something has to give. It will be the rules, of course, that go. But with government, this takes a long time.