<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Kenneth Neil Cukier</title>
	<atom:link href="http://cukier.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://cukier.wordpress.com</link>
	<description>dabbling in big data, internet governance, asia, etc</description>
	<lastBuildDate>Fri, 10 Feb 2012 07:31:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='cukier.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Kenneth Neil Cukier</title>
		<link>http://cukier.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://cukier.wordpress.com/osd.xml" title="Kenneth Neil Cukier" />
	<atom:link rel='hub' href='http://cukier.wordpress.com/?pushpress=hub'/>
		<item>
		<title>What Facebook&#8217;s IPO reveals about big-data analytics</title>
		<link>http://cukier.wordpress.com/2012/02/10/facebooks-ipo-big-data-analytics/</link>
		<comments>http://cukier.wordpress.com/2012/02/10/facebooks-ipo-big-data-analytics/#comments</comments>
		<pubDate>Fri, 10 Feb 2012 07:28:43 +0000</pubDate>
		<dc:creator>cukier</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Facebook]]></category>

		<guid isPermaLink="false">http://cukier.wordpress.com/?p=107</guid>
		<description><![CDATA[Those obsessed with Mammon will read Facebook&#8217;s IPO prospectus for what it says about making money. Others of us with a more geeky bent will pour over what it reveals about how the company handles data. It starts with arresting stats: 845 million active monthly users; 100 billion friendships, and every day 250 million photos [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=107&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Those obsessed with Mammon will read <a href="http://sec.gov/Archives/edgar/data/1326801/000119312512034517/d287954ds1.htm">Facebook&#8217;s IPO prospectus</a> for what it says about making money. Others of us with a more geeky bent will pour over what it reveals about how the company handles data. It starts with arresting stats: 845 million active monthly users; 100 billion friendships, and every day 250 million photos uploaded and 2.7 billion likes or comments.</p>
<p>But that is just the eye-candy. The substance is buried deep in the prose, under the heading &#8220;Data Management and Personalization Technologies.&#8221; Get a load of this:</p>
<blockquote><p>&#8220;loading a user’s home page typically requires accessing hundreds of servers, processing tens of thousands of individual pieces of data, and delivering the information selected in less than one second. In addition, the data relationships have grown exponentially and are constantly changing.&#8221;</p></blockquote>
<p>And then there is this:</p>
<blockquote><p>&#8220;We use a proprietary distributed system that is able to query thousands of pieces of content that may be of interest to an individual user to determine the most relevant and timely stories and deliver them to the user in milliseconds.&#8221;</p></blockquote>
<p>And this:</p>
<blockquote><p>&#8220;We store more than 100 petabytes (100 quadrillion bytes) of photos and videos.&#8221;</p></blockquote>
<p>And this:</p>
<blockquote><p>&#8220;We use an advanced click prediction system that weighs many real-time updated features using automated learning techniques. Our technology incorporates the estimated click-through rate with both the advertiser’s bid and a user relevancy signal to select the optimal ads to show.&#8221;</p></blockquote>
<p>But my favorite is this:</p>
<blockquote><p>&#8220;Our research and development expenses were $87 million, $144 million, and $388 million for 2009, 2010, and 2011, respectively.&#8221;</p></blockquote>
<p>So R&amp;D expenses grew almost five-fold in three years. Considering Facebook had $1 billion in profit on $3.7 billion of revenue last year, the company&#8217;s research budget came to 10% of sales. This is very healthy (albeit natural, perhaps, with a company boasting such hefty profit margins). According to the OECD, the top 100 R&amp;D-inteisve companies in the IT and telecoms sectors spend an average of nearly 7% of revenue on R&amp;D.</p>
<p>Most of the fruits of the R&amp;D is probably kept internal and covered under trade secrets. But for that generous sum, the prospectus informs us:</p>
<blockquote><p>&#8220;As of December 31, 2011, we had 56 issued patents and 503 filed patent applications in the United States and 33 corresponding patents and 149 filed patent applications in foreign countries relating to social networking, web technologies and infrastructure, and related technologies. Our issued patents expire between May 2016 and June 2031.&#8221;</p></blockquote>
<p>But the most interesting thing is how much was <em>not</em> exposed in the prospectus. In a section were Facebook purported to explain its analytics, with an example of how it uses elements on a webpage to determine what ads to show (page 87), the example was so juvenile as to be meaningless.</p>
<p>It is actually funny the way Facebook keeps quiet on analytics, considering that the first time the word appears is on page 12, when Facebook cites it as one of the &#8220;risk factors&#8221; that could ruin the business:</p>
<blockquote><p>&#8220;our inability to improve our analytics and measurement solutions that demonstrate the value of our ads and other commercial content&#8221;</p></blockquote>
<p>Though it is loath to make too much of it, since it is its main source of value, Facebook is an analytics company before anything else. Google might have been the world&#8217;s first big-data IPO. Facebook may be the first analytics one. But you wouldn&#8217;t know it from its IPO prospectus.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cukier.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cukier.wordpress.com/107/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cukier.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cukier.wordpress.com/107/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/cukier.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/cukier.wordpress.com/107/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/cukier.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/cukier.wordpress.com/107/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cukier.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cukier.wordpress.com/107/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cukier.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cukier.wordpress.com/107/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cukier.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cukier.wordpress.com/107/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=107&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cukier.wordpress.com/2012/02/10/facebooks-ipo-big-data-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/283e1946353600e7bef2d9aca657f8fa?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Cukier</media:title>
		</media:content>
	</item>
		<item>
		<title>Data in antiquity &#8212; and the info all around us</title>
		<link>http://cukier.wordpress.com/2011/12/29/data-in-antiquity-info-around-us/</link>
		<comments>http://cukier.wordpress.com/2011/12/29/data-in-antiquity-info-around-us/#comments</comments>
		<pubDate>Thu, 29 Dec 2011 04:13:26 +0000</pubDate>
		<dc:creator>cukier</dc:creator>
				<category><![CDATA[historical]]></category>
		<category><![CDATA[textual info]]></category>

		<guid isPermaLink="false">http://cukier.wordpress.com/?p=99</guid>
		<description><![CDATA[The always insightful Pete Warden recently penned a blog post on &#8220;What the Sumerians can teach us about data.&#8221; There is much to praise and react to in his analysis. But I&#8217;m struck in particular by a semantic matter: does Pete really mean &#8220;data&#8221; or &#8220;information&#8221;? I usually hate this genre of challenge; it&#8217;s the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=99&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The always insightful Pete Warden recently penned a blog post on &#8220;<a href="http://petewarden.typepad.com/searchbrowser/2011/12/why-the-sumerians-invented-data.html">What the Sumerians can teach us about data</a>.&#8221; There is much to praise and react to in his analysis. But I&#8217;m struck in particular by a semantic matter: does Pete really mean &#8220;data&#8221; or &#8220;information&#8221;? I usually hate this genre of challenge; it&#8217;s the most tedious in our business. But this time it deserves to be raised.</p>
<p>The reason is that the idea of quantification is really a phenomenon of the Middle Ages in Europe (laying to rest the old canard that they were &#8220;dark ages&#8221; devoid of progress). On the other hand, the period of antiquity is typified by man describing his world as one of qualities. (Remember Socrates&#8217;s &#8220;forms?&#8221; And Aristotle&#8217;s taxonomy on just about everything?)</p>
<p>To be sure, in the area of money we can talk about quantification and thus data as we think about it today. But in many of Pete&#8217;s terrific examples of how the Sumerians recorded their world &#8212; in the &#8220;fixed media&#8221; of clay tablets and the like &#8212; I am unsure if the term data fits.</p>
<p>Ought &#8220;writing&#8221; be considered data? If so, how about caveman paintings? Surely the Egyptian hieroglyphs imparted information &#8212; but should we call it &#8220;data&#8221; per se? The only way to answer that question is to define data.</p>
<p><a href="http://en.wikipedia.org/wiki/Data">The word data</a> is the plural of datum, neuter past participle of the Latin dare, &#8220;to give&#8221;, hence &#8220;something given,&#8221; instructs Wikipedia. &#8220;1. Facts and statistics collected together for reference or analysis. 2. The quantities, characters, or symbols on which operations are performed by a computer, being stored and transmitted in the form of&#8230;&#8221; <a href="http://www.google.com/search?client=safari&amp;rls=en&amp;q=define:data&amp;ie=UTF-8&amp;oe=UTF-8">reports a Google definition</a>.</p>
<p>Building on the idea that data may be something different than just recording information, at what point does something go from being simply info to data?</p>
<p>I have a few ideas on how to answer this &#8212; I am scribbling away on a large work that looks at this topic among others. But I&#8217;m not quite ready to share it with the world, since the thoughts are still fermenting. In the meantime, Pete&#8217;s post is a wonderful look at how an early society recorded and used information. Among my favorite points:</p>
<p>* &#8220;Written records remove the problem of fallible memories, but replaces it with a second-degree question of provenance. How do you know the data accurately reflects what happened?&#8221;</p>
<p>* &#8220;We still have a disturbing tendency to trust anything that&#8217;s recorded, without understanding the subjective process that went into creating the record.&#8221;</p>
<p>* &#8220;The main way Sumerians protected the integrity of their data was through curses. This may seem laughable to a modern audience, but I don&#8217;t think we&#8217;re so different. Do you expect the FBI to actually raid your house if you copy that VHS tape?&#8221;</p>
<p>* &#8220;In the absence of real answers, we&#8217;ll take bogus ones painted with a veneer of data, just like the Sumerians.</p>
<p>* &#8220;If there&#8217;s any way you can, please think about how to open up data you control, it&#8217;s the best way to pass it on to posterity.&#8221;</p>
<p>Having pointed out what I enjoyed most, let me close on a final quibble. Pete writes:</p>
<p>&#8220;The Sumerians recorded everything on stone or clay tablets &#8230; This data exhaust gives a rich view into trade, worship, life, death, medicine and almost every other aspect of the Sumerian&#8217;s world.&#8221;</p>
<p>It is absolutely not &#8220;data exhaust&#8221; in the way that the term has come to be known (and how I helped popularize it in a <a href="http://www.economist.com/node/15557443">report</a> a few years ago). The idea was information provided as a byproduct of interacting with information that itself could be collected and analyzed. The simplest example is tracking readers activities to reveal to website visitors the most-read articles, as a simple heuristic to indicate what might interest them.</p>
<p>What Pete describes, and what the Sumerians recorded, was information (or perhaps data) pure and simple. No &#8220;exhaust&#8221; about it &#8212; other than that the tablets had been thrown away by the Sumerians before modern archeologists dug them up.</p>
<p>But all this ranting is only meant to add momentum to my appreciation for Pete&#8217;s splendid work in this post and others!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cukier.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cukier.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cukier.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cukier.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/cukier.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/cukier.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/cukier.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/cukier.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cukier.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cukier.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cukier.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cukier.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cukier.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cukier.wordpress.com/99/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=99&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cukier.wordpress.com/2011/12/29/data-in-antiquity-info-around-us/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/283e1946353600e7bef2d9aca657f8fa?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Cukier</media:title>
		</media:content>
	</item>
		<item>
		<title>From insanity to inanity</title>
		<link>http://cukier.wordpress.com/2011/06/12/wikileaks-us-policy/</link>
		<comments>http://cukier.wordpress.com/2011/06/12/wikileaks-us-policy/#comments</comments>
		<pubDate>Sun, 12 Jun 2011 22:22:43 +0000</pubDate>
		<dc:creator>cukier</dc:creator>
				<category><![CDATA[government policy]]></category>
		<category><![CDATA[textual info]]></category>
		<category><![CDATA[Wikileaks]]></category>

		<guid isPermaLink="false">http://cukier.wordpress.com/?p=85</guid>
		<description><![CDATA[How to craft rules in a BigData world for information access? It is a hard question. But how not to is far clearer. According to a new US government policy, lawyers representing Guantanamo prisoners are allowed to read Wikileaks&#8217; classified US documents &#8212; but not print or save them. The actual policy &#8220;guidance&#8221; is here [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=85&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>How to craft rules in a BigData world for information access? It is a hard question. But how <em>not</em> to is far clearer. </p>
<p>According to a new US government policy, lawyers representing Guantanamo prisoners are allowed to read Wikileaks&#8217; classified US documents &#8212; but not print or save them. The actual policy &#8220;guidance&#8221; is <a href="http://www.politico.com/static/PPM170_wikiguidance57.html">here</a> (from Politico) and an analysis by Politico&#8217;s Josh Gerstein is <a href="http://www.politico.com/blogs/joshgerstein/0611/Feds_policy_on_reading_Wikileaks_docs_incoherent_critics_say.html">here</a>.</p>
<p>Are the US officials that devised this policy out of their minds? How could anyone rationally adopt such an inherently inconsistent policy? </p>
<p>If the lawyers cannot read the material, they are blocked from accessing pertinent information that is already in the public domain, which could help them prepare a defense. Allowing access is only sensible. To do otherwise would be to deny reality (that the material is widely available), and might deny justice too. </p>
<p>However, crippling that access by placing arbitrary restrictions on its use make no sense whatsoever. Why? On what basis is one allowed to read but not print or save? Surely the US does not mean for the frailty of a person&#8217;s memory to govern how material is put to use. But that is the policy&#8217;s effect.</p>
<p>The irony is that the current policy is actually a slightly more rational shift from previous rules that forbid any access at all. It underscores the fact that the government has no clue how to respond to the new world we&#8217;re in regarding BigData leaks. </p>
<p>And it is a longstanding problem. Just this month, the US officialy released the trove of documents known as the Pentagon Papers &#8212; 40 years after they appeared in the New York Times. (The AP&#8217;s story is <a href="http://news.yahoo.com/s/ap/20110613/ap_on_re_us/us_pentagon_papers%3b_ylt=AtDfOXh1TsqgyYxAMxKefVGs0NUE%3b_ylu=X3oDMTNqYmVnNzd2BGFzc2V0A2FwLzIwMTEwNjEzL3VzX3BlbnRhZ29uX3BhcGVycwRjY29kZQNtb3N0cG9wdWxhcgRjcG9zAzMEcG9zAzEzBHB0A2hvbWVfY29rZQRzZWMDeW5fdG9wX3N0b3J5BHNsawM0MHllYXJzYWZ0ZXI">here</a>) The Economist, in an article last month about it (&#8220;<a href="http://www.economist.com/blogs/democracyinamerica/2011/05/wikileaks_and_pentagon_papers">The open society and its ostriches</a>&#8220;) argued that the way to think about these cases is that &#8220;the illegal disclosures in effect declassify the information.&#8221; </p>
<p>When the contradiction between futile policies and the reality on the ground grow so wide as to be preposterous &#8212; as it is now &#8212; something has to give. It will be the rules, of course, that go. But with government, this takes a long time. </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cukier.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cukier.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cukier.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cukier.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/cukier.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/cukier.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/cukier.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/cukier.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cukier.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cukier.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cukier.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cukier.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cukier.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cukier.wordpress.com/85/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=85&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cukier.wordpress.com/2011/06/12/wikileaks-us-policy/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/283e1946353600e7bef2d9aca657f8fa?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Cukier</media:title>
		</media:content>
	</item>
		<item>
		<title>Scott McNealy&#8217;s latest privacy top-ten</title>
		<link>http://cukier.wordpress.com/2011/05/21/mcnealys-privacy-top-ten/</link>
		<comments>http://cukier.wordpress.com/2011/05/21/mcnealys-privacy-top-ten/#comments</comments>
		<pubDate>Sat, 21 May 2011 23:24:22 +0000</pubDate>
		<dc:creator>cukier</dc:creator>
				<category><![CDATA[privacy]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://cukier.wordpress.com/?p=68</guid>
		<description><![CDATA[Scott McNealy, the co-founder and long time boss of Sun Microsystems, was famous for his &#8220;top ten&#8221; riffs on tech trends. Today he&#8217;s recreated it on Twitter (follow @scottmcnealy), reprising his famous remark in 1999: &#8220;You have zero privacy anyway. Get over it.&#8221; Here&#8217;s a compilation of the tweets (followed by a quick analysis relating [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=68&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Scott McNealy, the co-founder and long time boss of Sun Microsystems, was famous for his &#8220;top ten&#8221; riffs on tech trends. Today he&#8217;s recreated it on Twitter (follow @scottmcnealy), reprising his famous remark in 1999: &#8220;You have zero privacy anyway. Get over it.&#8221;</p>
<p>Here&#8217;s a compilation of the tweets (followed by a quick analysis relating it to Sony&#8217;s Stringer on security):</p>
<p>* * *</p>
<p>Top 10 signs you no longer have privacy and should get over it:<br />
10. The guy behind the McDonalds counter greets you with, &#8220;Would you like a salad to help you with your constipation?&#8221;<br />
9. A Google search on &#8220;white only clubs&#8221; has just one result: TaylorMade.<br />
8. Your soon to be ex-spouse produces your iPhone GPS database in settlement hearings.<br />
7. The TSA stops molesting and radiating your 82 year old mom because she is clearly not going to hijack that plane.<br />
6. 20 neighbors show up at same Groupon inspired Spearmint Rhino happy hour in Vegas.<br />
5. IRS starts auditing folks who don&#8217;t pay income taxes, not the folks who pay the most.<br />
4. Local police become largest purchaser of camera equipped UAV&#8217;s.<br />
3. Your parents require your Facebook, laptop, and phone passwords and actually review your online activity regularly. And you are 40.<br />
2. The UPS driver delivers your small package to your door and, with a smile and wink, asks if you would like batteries with that.<br />
1. Twitter starts suggesting Tweets for you, and they are perfect and better than your own.</p>
<p>* * *</p>
<p>As in 1999, McNealy is right on fact, wrong on what to do about it (as critics argued at the time). Not ensuring some protections is irrational. But whether he&#8217;s right or not is beside the point. It is refreshing when a top executive calls it as he sees it &#8212; and a bit silly when people quibble with the wording rather than the larger point itself.</p>
<p>Here, I&#8217;m thinking of Sony&#8217;s boss, Howard Stringer, who recently described the PlayStation Network hack is words that was sure to eviscerate him among tech journos. “Nobody’s system is 100 percent secure,” he said in a conference call. “This is a hiccup in the road to a network future.” (<a href="http://www.bloomberg.com/news/2011-05-17/sony-chairman-stringer-calls-hacker-attack-hiccup-in-road.html">in Bloomberg&#8217;s piece</a>). &#8220;It&#8217;s not a brave new world; it&#8217;s a bad new world,&#8221; he said <a href="http://online.wsj.com/article/SB10001424052748703421204576328982377107892.html#ixzz1N218Bizi"> (in the WSJ piece)</a>.</p>
<p>Stringer has been pounced on by some in the press. He shouldn&#8217;t be. Though the point he raises we&#8217;ve known for a long time, it is still quite right.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cukier.wordpress.com/68/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cukier.wordpress.com/68/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cukier.wordpress.com/68/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cukier.wordpress.com/68/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/cukier.wordpress.com/68/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/cukier.wordpress.com/68/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/cukier.wordpress.com/68/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/cukier.wordpress.com/68/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cukier.wordpress.com/68/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cukier.wordpress.com/68/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cukier.wordpress.com/68/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cukier.wordpress.com/68/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cukier.wordpress.com/68/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cukier.wordpress.com/68/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=68&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cukier.wordpress.com/2011/05/21/mcnealys-privacy-top-ten/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/283e1946353600e7bef2d9aca657f8fa?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Cukier</media:title>
		</media:content>
	</item>
		<item>
		<title>Killing geo-location in the crib</title>
		<link>http://cukier.wordpress.com/2011/05/14/geolocation-in-crib/</link>
		<comments>http://cukier.wordpress.com/2011/05/14/geolocation-in-crib/#comments</comments>
		<pubDate>Sat, 14 May 2011 00:51:14 +0000</pubDate>
		<dc:creator>cukier</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://cukier.wordpress.com/?p=58</guid>
		<description><![CDATA[Is this a return of the US-EU privacy wars of the 1990s, when Brussels&#8217; bureaucrats threatened to halt intercontinental online transactions? They may be coming back. This time it&#8217;s over location data. An upcoming EU report will say that &#8220;geo-location data has to be considered as personal data&#8230; The rules on personal data apply,&#8221; an [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=58&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Is this a return of the US-EU privacy wars of the 1990s, when Brussels&#8217; bureaucrats threatened to halt intercontinental online transactions? They may be coming back. This time it&#8217;s over location data. </p>
<p>An upcoming EU report will say that &#8220;geo-location data has to be considered as personal data&#8230; The rules on personal data apply,&#8221; an <a href="http://online.wsj.com/article/SB10001424052748704681904576319192502261716.html">EU official tells the Wall Street Journal</a>.<br />
The implication is that data collected by cellphones, twitter, Facebook and others must be handled like names, birth dates, and other personal information: requiring user consent, deletion after a certain period, and kept anonymously. </p>
<p>This is absolutely preposterous. Yes, rules &#8212; better, tougher rules &#8212; are desperately needed. But to simply drop the data into a pre-existing regulatory bucket (as the EU is doing, of calling it &#8220;personal information&#8221; which has sweeping regulatory burdens) is asinine. It will hold back the amazing innovations and services that are just starting to emerge, and future ones that we can scarcely imagine today.</p>
<p>Calling something new (geo-location data) something old (personal identifiable information, or PII in the trade) is a far too blunt way to go about upholding legitimate public interest concerns that need to be addressed. It avoids the more humble &#8212; and probably more effective &#8212; task of trying to figure out the new properties of this type of data, and thus devise appropriate ways to balance personal privacy with innovative services. It&#8217;s harder to do this, but sounder. </p>
<p>This of course will happen, but over time, and probably in a different regulatory jurisdiction. Possibly America? Perhaps China? Maybe Brazil? But European geo-loco firms will suffer in the meantime, since they&#8217;ll crammed into a regulatory straitjacket. And to be clear: this is not to say that better rules aren&#8217;t needed &#8212; they definitely are. But they ought be sensible ones. </p>
<p>Failing to take a more cautious and reflective regulatory approach results in things like the EU&#8217;s 1998 privacy directive. It did an excellent job of getting governments into the privacy arena, but it had lots of silly parts too. For one thing, it required an international &#8220;safe-harbor&#8221; provision in order to do innocuous things like allowing a US firm in France to send its payroll data to headquarters in Detroit. The rules are already out of date, and although it boasts strong enforcement provisions, they&#8217;ve barely ever been used. </p>
<p>In fairness, the US has miserable privacy legislation &#8212; no country does it well &#8212; but the piecemeal approach and building up of a body of regulatory experience is looking like a better way forward. There is no &#8220;privacy kommissar&#8221; in America, but that hasn&#8217;t stopped the FTC from taking serious action often. </p>
<p>A far better way to proceed is the way the US is moving. <a href="http://franken.senate.gov/?p=news&amp;id=1495">Sen. Al Franken&#8217;s opening statement to hearings</a> on May 10th on cellphone privacy was a paragon of wise policymaking: he wants to find the right balance. He was scorching in his condemnation of current practices:</p>
<blockquote><p>&#8220;Once the maker of a mobile app, a company like Apple or Google, or even your wireless company gets your location information &#8230; these companies are free to disclose your location information and other sensitive information to almost anyone they please-without letting you know. And then the companies they share your information with can share and sell it to yet others-again, without letting you know. This is a problem. It&#8217;s a serious problem.&#8221;
</p></blockquote>
<p>But at the same time, he understood the risks of regulating too soon: </p>
<blockquote><p>&#8220;I just want to be clear that the answer to this problem is not ending location-based services. No one up here wants to stop Apple or Google from producing their products or doing the incredible things that you do. You guys are brilliant. When people think of the word &#8220;brilliant&#8221; they think of the people that founded and run your companies.&#8221;
</p></blockquote>
<p>If this gap in regulatory approach is not settled, the result may well be another round of the privacy wars. Companies like Apple, Google, Facebook, Twitter, Foursquare and others will have to tailor their operations depending on jurisdiction, down to their very code base. The EU will argue that they have to do this any way for language and law. But this still fractures and debilitates  the service. And it is hypocritical: the idea behind the EU&#8217;s common market and common currency is about the gains from harmonization. </p>
<p>The best way to ward off bad public policy are good case studies of excellent services. Industry basically has its head in the sand hoping this issue will go away (it won&#8217;t) or is in hiding, hoping it doesn&#8217;t need to disclose how the services work (it does). Their actions are shortsighted. Geo-location services are interesting and useful, and if people really knew what was happening, many would be fine with it, provided a backstop of basic protections exist. </p>
<p>The case must be made publicly. So what are the amazing new services that are emerging that show why the EU&#8217;s approach is not quite right? Share your stories here. </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cukier.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cukier.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cukier.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cukier.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/cukier.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/cukier.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/cukier.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/cukier.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cukier.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cukier.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cukier.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cukier.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cukier.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cukier.wordpress.com/58/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=58&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cukier.wordpress.com/2011/05/14/geolocation-in-crib/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/283e1946353600e7bef2d9aca657f8fa?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Cukier</media:title>
		</media:content>
	</item>
		<item>
		<title>Tautology, or something more?</title>
		<link>http://cukier.wordpress.com/2011/05/06/tautology-or-something-more/</link>
		<comments>http://cukier.wordpress.com/2011/05/06/tautology-or-something-more/#comments</comments>
		<pubDate>Fri, 06 May 2011 08:17:35 +0000</pubDate>
		<dc:creator>cukier</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[metadata]]></category>

		<guid isPermaLink="false">http://cukier.wordpress.com/?p=39</guid>
		<description><![CDATA[This might be utterly obvious, but let me posit that one of the most compelling features of the current information avalanche is that (if you will): &#8220;big-data solves the problem of big-data.&#8221; The problem is that the amount of information has expanded so much that it has become almost impossible to work with or comprehend [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=39&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This might be utterly obvious, but let me posit that one of the most compelling features of the current information avalanche is that (if you will): &#8220;big-data solves the problem of big-data.&#8221;</p>
<p>The problem is that the amount of information has expanded so much that it has become almost impossible to work with or comprehend in its totality. But new techniques that actually <em>rely</em> on the huge scale makes that huge scale manageable and indeed useful. And though it is certainly an overstatement to say it &#8220;solves the problem,&#8221; I&#8217;d argue this the right way to think about it.</p>
<p>Examples abound, if we look at things the right way. For example, computer translation was hard until the &#8220;test data&#8221; went from the billions to one trillion words &#8212; and then the machines got talking (as Google&#8217;s <a href="http://www.google.com/research/pubs/author205.html">Peter Norvig</a> explains <a href="http://www.nypost.com/f/print/news/opinion/opedcolumnists/the_machine_age_tM7xPAv4pI4JslK0M1JtxI">here</a> and Steven Levy tell <a href="http://gizmo.do/k6m8gt">here</a>). Likewise, <a href="http://jeffjonas.typepad.com/jeff_jonas/">Jeff Jonas of IBM</a> recounts a situation years ago when by adding more information to a database of people, the number of records on individuals actually shrunk: he was able to identify and consolidate duplicate entries.</p>
<p>But the inspiration for these musings is an article in last week&#8217;s The Economist, &#8220;<a href="http://www.economist.com/node/18618025?story_id=18618025">The science of science</a>: How to use the web to understand the way ideas evolve.&#8221; Researchers came up with a clever way to identify and classify texts by grasping meaning from their content, outside of what the authors felt the classification ought be.</p>
<p>This lets machines parse huge volumes of text that people can&#8217;t do, or can&#8217;t do well. Academic authors label the subject areas of their papers, but sometimes use far too many as a way to trick people into reading it, or are limited to just five labels which may be too narrow. Sometimes they are required to use pre-determined labels from library science, which fails to account for emerging areas of scholarship. So for example, Adam Smith never regarded himself an economist &#8212; the term didn&#8217;t exist in that context &#8212; rather, he was a moral philosopher. This system would place him alongside Malthus, a pastor by trade and demographer by study, who incontestably wrote on economics.</p>
<p>Moreover, the system enables one to see how ideas molt and meld over time &#8212; just as Smith and Malthus seemed out of step with their &#8220;professions&#8221; in their time, but were foundational for the new field of economics. And it bears repeating: the reason the technology described in the article works is because there is enough data to make inferences about meaning. As the article states:</p>
<blockquote><p>&#8220;Citation indices, which work only where publications refer to their sources explicitly, form a tiny nebula in the digital universe. News articles, blog posts and e-mails often lack a systematic reference list that could be used to make a citation index. Yet they, too, are part of what makes an idea influential.&#8221;</p></blockquote>
<p>This opens up new areas for researches to amass sources. For instance, the huge area of &#8220;gray literature&#8221; (as it&#8217;s called in library science) that is slightly outside the mainstream publication world is now more easily retrievable and citable.</p>
<p>It also indirectly overcomes Google&#8217;s inherent shortcoming. Google&#8217;s PageRank algorithm, at its most basic level, counts inbound link akin to academic citations and presumes that a page with more is more relevant. But basing relevance on link structure invites imperfection because ordinary people are themselves imperfect and may not link to the ideal content, thus creating suboptimal search results. The technique described in the article may help remedy this.</p>
<p>The upshot is that we are generally familiar with the idea that a characteristic of big-data is it seems to exhibit &#8220;inverse scaling features&#8221;: the more data you add, the better the system gets (rather than deteriorates, as most systems do when under more load). But another step ahead of this point is that &#8220;big-data solves the problem of big-data.&#8221; With so much info around, the only way to tackle it is to use its huge size to sort itself. This idea sounds like a serpent eating its tail &#8212; but it may be more than that.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cukier.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cukier.wordpress.com/39/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cukier.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cukier.wordpress.com/39/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/cukier.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/cukier.wordpress.com/39/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/cukier.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/cukier.wordpress.com/39/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cukier.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cukier.wordpress.com/39/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cukier.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cukier.wordpress.com/39/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cukier.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cukier.wordpress.com/39/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=39&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cukier.wordpress.com/2011/05/06/tautology-or-something-more/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/283e1946353600e7bef2d9aca657f8fa?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Cukier</media:title>
		</media:content>
	</item>
		<item>
		<title>What donations tell us about &#8230; more donations</title>
		<link>http://cukier.wordpress.com/2011/04/26/what-donations-tell-us-about-more-donations/</link>
		<comments>http://cukier.wordpress.com/2011/04/26/what-donations-tell-us-about-more-donations/#comments</comments>
		<pubDate>Tue, 26 Apr 2011 01:30:37 +0000</pubDate>
		<dc:creator>cukier</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[NGOs]]></category>

		<guid isPermaLink="false">http://cukier.wordpress.com/?p=34</guid>
		<description><![CDATA[One of the most impressive trends over the past decade (and broadly, the past century) has been the rise of the NGO. In the 1990s they mushroomed like start-ups and attracted &#8220;social entrepreneurs.&#8221; The bigger shift today is that it&#8217;s no longer a person&#8217;s full-time job: now actual entrepreneurs toiling at start-ups have their own [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=34&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>One of the most impressive trends over the past decade (and broadly, the past century) has been the rise of the NGO. In the 1990s they mushroomed like start-ups and attracted &#8220;social entrepreneurs.&#8221; The bigger shift today is that it&#8217;s no longer a person&#8217;s full-time job: now actual entrepreneurs toiling at start-ups have their own philanthropic gig on the side. A computer went from a 2-ton, $2 million, room-sized machine to a pocket-sized thing. So did non-profit organizations.</p>
<p>I recently scribbled a few thoughts about the data dimensions of responding to Japan&#8217;s crisis for <em>The Economist&#8217;s</em> website: &#8220;<a href="http://www.economist.com/blogs/babbage/2011/04/dealing_japans_disaster">The information equation</a>&#8221; on April 24th. I was impressed that a private-sector company was playing the role that a governmental organization or NGO might play. (It&#8217;s a Google.org project, to be exact.)</p>
<p>Among the things I learned was that Google collected $5.5 million in donations through its <a href="http://www.google.com/crisisresponse/japanquake2011.html">crisis-response page</a>. A small but not insignificant haul. But it got me thinking. The world of BigData is about learning new things from information that is otherwise invisible to the naked eye. What could the donation data tell us about how to more effectively solicit charitable contributions? Specifically, as I wrote in the penultimate paragraph of the article:  </p>
<blockquote><p>The donation data may offer a chance to learn new things about how people contribute. For example, what is the average amount? Does it follow a standard normal deviation (ie, a &#8220;bell curve&#8221;) in which a few give a little and a lot, with the majority donating around $15? Or is it a power-law distribution, in which there are two or three extremely rich donors, a handful of generous ones, followed by a long tail of $2 contributions? Did they donate using PayPal or credit cards? What time of day do people give? Is it after they have read a news story or clicked a link within an e-mail? The information would help fundraisers tailor how to make their appeals. And the data can be broken down by country or even city via Internet Protocol addresses.
</p></blockquote>
<p>I&#8217;ve asked Google&#8217;s hyper-helpful PR team to run the idea past their number-crunchers, to get access to the findings so I can write a story about this. It&#8217;s sort of like Google Flu Trends, but for charities. It would be highly valuable information for NGOs to know &#8212; particularly one that is dear to my heart, <a href="http://www.ibj.org">International Bridges to Justice</a> (where I proudly serve on the board). </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cukier.wordpress.com/34/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cukier.wordpress.com/34/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cukier.wordpress.com/34/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cukier.wordpress.com/34/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/cukier.wordpress.com/34/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/cukier.wordpress.com/34/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/cukier.wordpress.com/34/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/cukier.wordpress.com/34/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cukier.wordpress.com/34/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cukier.wordpress.com/34/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cukier.wordpress.com/34/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cukier.wordpress.com/34/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cukier.wordpress.com/34/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cukier.wordpress.com/34/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=34&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cukier.wordpress.com/2011/04/26/what-donations-tell-us-about-more-donations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/283e1946353600e7bef2d9aca657f8fa?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Cukier</media:title>
		</media:content>
	</item>
		<item>
		<title>From &#8220;Ick!&#8221; to &#8220;Wow!&#8221; to &#8220;Wait!&#8221;</title>
		<link>http://cukier.wordpress.com/2011/04/24/from-ick-to-wow-to-wait/</link>
		<comments>http://cukier.wordpress.com/2011/04/24/from-ick-to-wow-to-wait/#comments</comments>
		<pubDate>Sun, 24 Apr 2011 14:03:41 +0000</pubDate>
		<dc:creator>cukier</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[cellphones]]></category>
		<category><![CDATA[privacy]]></category>

		<guid isPermaLink="false">http://cukier.wordpress.com/?p=27</guid>
		<description><![CDATA[What is the world of BigData? To define it is to limit it. But here&#8217;s an entry in: it refers to things that we cannot see with the naked eye, but is only revealed to us from a huge body of data. So, for example, we may not know what people think the best restaurant [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=27&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>What is the world of BigData? To define it is to limit it. But here&#8217;s an entry in: it refers to things that we cannot see with the naked eye, but is only revealed to us from a huge body of data. So, for example, we may not know what people think the best restaurant is in terms of food and service and atmosphere and a good time. But being able to see the percentage left as tips could be a way of learning this, in a way we couldn&#8217;t know in the past &#8212; when we couldn&#8217;t get the data or process it.</p>
<p>This is a fictitious example (and one imagined by <a href="http://www.metamarketsgroup.com/team.php">Mike Driscoll of Metamarkets</a>). But it helps to concentrate the mind on what is new about the BigData revolution taking place, and how information cleverly reused can create new sources of economic value. And this leads me to thinking about the Wall Street Journal&#8217;s excellent article &#8220;<a href="http://on.wsj.com/fAbZid">The Really Smart Phone</a>&#8221; by Robert Lee Hotz, which is part of the paper&#8217;s impressive series &#8220;What They Know.&#8221;</p>
<p>There&#8217;s much to praise in the piece. Instead, I want to put forward some vital distinctions that industry needs to consider, when thinking about some of the trends happening.</p>
<p>First, we need to separate the process of BigData from its output. The article &#8212; like the industry &#8212; doesn&#8217;t really do this. For example, sometimes we talk about being able to track 100 million cellphone users (but don&#8217;t note the substance of what is being tracked: calls? location? bills?) And sometimes we talk about what we learn, such as a person&#8217;s susceptibility to obesity. But is it because location data shows they&#8217;ve been sedentary? Or because they bought lots of ice-cream from their iPhone?</p>
<p>These distinctions are crucial. In one instance, it is anonymized metadata, in another it is individual information. The ways that entities are allowed use these different types of data perhaps ought be different too. </p>
<p>Most people, and most articles in the press, approach the BigData issue from the negative: &#8220;if you only knew what they know about you!&#8221; But I believe that industry ought be far more transparent because if people did know, they&#8217;d probably be more impressed than alarmed. (It is a point that I made in <a href="http://www.economist.com/node/15557431">my special report &#8220;The data deluge&#8221; in The Economist</a> last year.) Specifically, what is so unsettling: what they collect? Or what they know? On the surface, we bristle at both. But when we look deeper, it gets fascinating to see what new things can be learned from a big body of data.</p>
<p>Hence, the transition from &#8220;Ick!&#8221; to &#8220;Wow!&#8221; But I think the failure of industry to be open about its practices will hold it back. Thus: the public will cry: &#8220;Wait!&#8221;</p>
<p>People are antsy. Regulators are uneasy. And business is barricading itself. Amazon never discusses it. Google does &#8212; and thus invites abuse, alas. Apple is characteristically silent. &#8220;Google Inc. defended the way it collects location data from Android phones, while Apple Inc. remained silent for a third day,&#8221; the <a href="http://online.wsj.com/article/SB10001424052748703387904576279451001593760.html#ixzz1KSTMah00">WSJ wrote in a separate article in April 23</a>.</p>
<p>I think with the right outreach, BigData firms can make the case for collecting and processing the information. It will change the debate to the more essential questions: who owns the information, who gets to benefit from it, how it is valued, how it is protected and what are the penalties if this trust is abused? </p>
<p>This further establishes the distinction I identified at the outset: separating the process from the substance. In fact, we are talking about so many records &#8212; millions of people, zillions of data-points of locations or calls &#8212; that the practical effect seems to be anonymization, even if is not done effectively in practice.</p>
<p>To overcome the &#8220;Wait!,&#8221; I&#8217;d urge that we make rules. As for a starting point to think about them, I&#8217;ll discuss another time. For now, a look at what the WSJ piece did a nice job of highlighting. The large numbers associated with the research was interesting, but actually unimportant &#8212; they&#8217;re just big numbers; the process. The actual output of what is to be learned is far most interesting. Specifically, cellphone BigData lets us:</p>
<p>- pinpoint &#8220;influencers,&#8221; the people most likely to make others change their minds</p>
<p>- forecast where people are likely to be at any given time in the future</p>
<p>- predict which people are most likely to defect to other cellphone carriers</p>
<p>- reveal subtle symptoms of mental illness</p>
<p>- foretell movements in the Dow Jones Industrial Average,</p>
<p>- chart the spread of political ideas as they move through a community</p>
<p>- expose a cultural split that is driving a historic political crisis (in Belgium)</p>
<p>- deduce that two people were talking about politics</p>
<p>- detect flu symptoms before the students themselves realized they were getting sick</p>
<p>A final note: all of these insights were gleaned by parsing two types of data: location and interconnections among users &#8212; metadata. There is a lot more mobile data to collect; we&#8217;ve barely scratched the surface. Also, nota bene that none of the data relates to specific content from the phone or user. And it is not clear that the data collected can be traced back to a specific user, other than in cases of academic research in which consent was granted.</p>
<p>In some ways, the data collected looks like the &#8220;pen register&#8221; information that is less spooky than an outright wiretap: eg, who calls whom and when, but not what they said. It has a lower standard for law enforcement to obtain. </p>
<p>My point is this: the BigData issues we&#8217;re confronting now are the easy ones. So this is the moment to start thinking about seriously debating them, and arriving at answers &#8212; as a precursor to the harder issues coming down the pike.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cukier.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cukier.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cukier.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cukier.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/cukier.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/cukier.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/cukier.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/cukier.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cukier.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cukier.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cukier.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cukier.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cukier.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cukier.wordpress.com/27/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=27&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cukier.wordpress.com/2011/04/24/from-ick-to-wow-to-wait/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/283e1946353600e7bef2d9aca657f8fa?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Cukier</media:title>
		</media:content>
	</item>
		<item>
		<title>Data; boring but&#8230;</title>
		<link>http://cukier.wordpress.com/2011/03/06/data-boring-but/</link>
		<comments>http://cukier.wordpress.com/2011/03/06/data-boring-but/#comments</comments>
		<pubDate>Sun, 06 Mar 2011 14:33:49 +0000</pubDate>
		<dc:creator>cukier</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Freedonia]]></category>

		<guid isPermaLink="false">http://cukier.wordpress.com/?p=19</guid>
		<description><![CDATA[I had a delightful dinner with a Canadian (Tory) politician this evening. Before we got around to talking about gun-control and &#8220;the situation in Freedonia,&#8221; the conversation fell upon data. &#8220;It probably sounds like the boring-est thing in the world,&#8221; I explained, &#8220;but it&#8217;s not: it is the most exciting, and arguably the most important in [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=19&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I had a delightful dinner with a Canadian (Tory) politician this evening. Before we got around to talking about gun-control and &#8220;the situation in Freedonia,&#8221; the conversation fell upon data. &#8220;It probably sounds like the boring-est thing in the world,&#8221; I explained, &#8220;but it&#8217;s not: it is the most exciting, and arguably the most important in your lifetime.&#8221;</p>
<p>As I said this, his look changed from one of polite indifference (and mild pity), to genuine curiosity. Now, I had to explain myself. What I blurted out what this:</p>
<p>The world is becoming data-ized as digital information and numerical measurement is being applied to all aspects of what people do, particularly things that couldn&#8217;t be measured before because it was impractical or impossible. (Think: using wireless and GPS in cars to base insurance premiums on where and when people actually drive, as has been possible since 2007.)</p>
<p>The impact will be as profound as the scientific method in the 18th century &#8212; which quickly moved past the sciences and left its mark on all areas of human endeavor. For instance, what is &#8220;quantitative decision making&#8221; in management, if not the scientific method applied to business&#8230;. Likewise, the BigData revolution is plowing through the sciences, and also jumped into mainstream areas, such as business and government.</p>
<p>My dinner companion got it.</p>
<p>It is not easy to describe what is happening and why it matters. But parallels with the past, albeit imperfect, are usually very useful. Hence, trotting out the scientific method to explain the here and now.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cukier.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cukier.wordpress.com/19/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cukier.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cukier.wordpress.com/19/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/cukier.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/cukier.wordpress.com/19/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/cukier.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/cukier.wordpress.com/19/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cukier.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cukier.wordpress.com/19/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cukier.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cukier.wordpress.com/19/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cukier.wordpress.com/19/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cukier.wordpress.com/19/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=19&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cukier.wordpress.com/2011/03/06/data-boring-but/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/283e1946353600e7bef2d9aca657f8fa?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Cukier</media:title>
		</media:content>
	</item>
		<item>
		<title>Fighting fraud with data? Maybe&#8230;</title>
		<link>http://cukier.wordpress.com/2011/02/25/fighting-fraud-with-data-maybe/</link>
		<comments>http://cukier.wordpress.com/2011/02/25/fighting-fraud-with-data-maybe/#comments</comments>
		<pubDate>Fri, 25 Feb 2011 14:20:39 +0000</pubDate>
		<dc:creator>cukier</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[finance]]></category>

		<guid isPermaLink="false">http://cukier.wordpress.com/?p=13</guid>
		<description><![CDATA[A new report by O&#8217;Reilly and PayPal (available for free download), posits that (as the O&#8217;Reilly bloggers put it): &#8220;Big data thwarts fraud.&#8221; It is an intriguing idea. One possibility comes to mind: perhaps shopping patterns are so statistically consistent, routine and as personal as DNA, that information about a person&#8217;s previous purchases &#8212; or even non-shopping [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=13&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A new report by O&#8217;Reilly and PayPal (available for free <a href="https://www.x.com/community/ppx/devzone/research2010">download</a>), posits that (as the O&#8217;Reilly bloggers put it): &#8220;<a href="http://radar.oreilly.com/2011/02/big-data-fraud-protection-payment.html" target="_self">Big data thwarts fraud</a>.&#8221; It is an intriguing idea.</p>
<p>One possibility comes to mind: perhaps shopping patterns are so statistically consistent, routine and as personal as DNA, that information about a person&#8217;s previous purchases &#8212; or even non-shopping activities &#8212; enables an algorithm to know if the customer is truly the person he or she says.</p>
<p>That is interesting. But, alas, the report looks at more prosaic things, ie:</p>
<blockquote><p>PayPal, Amazon, and Google have all developed sophisticated analytical tools and infrastructure to identify patterns of fraudulent activity. Paypal, for example, has a series of Fraud Management Filters that screen payments and sort out transactions that warrant review because of their <strong>amount, their origin, or other factors</strong> that can be set by a merchant. [...] PayPal and Amazon have developed fraud detection tools that depend on massive datasets containing not only <strong>financial details for transactions, but IP addresses, browser information, and other technical data</strong> that will help these companies refine models to predict, identify, and prevent fraudulent activity. PayPal and Amazon have had years to amass databases of the transaction details for hundreds of millions of customers across thousands of merchants.</p></blockquote>
<p>The sort of filtering and checking described above (bold emphasis mine) involves no conceptual shift in how to use data. All that is being described is doing the same intutive techniques that one would have long done in a world of &#8220;small data.&#8221; The only thing &#8220;big&#8221; about it is that there&#8217;s a lot more data to sift through. But the firms are not <em>using</em> the size and depth of data to do anything novel per se.</p>
<p>This is a pity. The revolution that is taking place in other dimensions of the Internet industry is that companies can do entirely new things with a big data set that they cannot do with a small one. A former top Google executive once told me that Google Checkout was created in part because the firm realized that learning about a customer&#8217;s shopping pattern could better detect fraud, which is the key e-commerce stumbling block.</p>
<p>Likewise, at the O&#8217;Reilly Strata conference in February, hallway chit chat was about how a financial services firm might be able to more accurately predict whether someone will repay a loan using Facebook&#8217;s social graph than a FICO score, since best predictor if person will repay is if their friends repay their loans. (Actually, the example was told to me as if it were already being done, though not with Facebook&#8217;s data). Yet I think I&#8217;m safer considering it apocryphal until I hear it first hand.</p>
<p>Does anyone know of incredible stories of how &#8220;big data&#8221; is being used in new ways to reduce financial fraud? If so, comment here or email me directly.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cukier.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cukier.wordpress.com/13/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cukier.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cukier.wordpress.com/13/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/cukier.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/cukier.wordpress.com/13/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/cukier.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/cukier.wordpress.com/13/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cukier.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cukier.wordpress.com/13/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cukier.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cukier.wordpress.com/13/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cukier.wordpress.com/13/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cukier.wordpress.com/13/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cukier.wordpress.com&amp;blog=1548007&amp;post=13&amp;subd=cukier&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cukier.wordpress.com/2011/02/25/fighting-fraud-with-data-maybe/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/283e1946353600e7bef2d9aca657f8fa?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Cukier</media:title>
		</media:content>
	</item>
	</channel>
</rss>
