Thursday, 22 June 2017

The Social and the Technological: Rethinking each in light of the other

In June of 2008, former Nature and Science editor Chris Anderson wrote a rather controversial article for Wired magazine. Titled The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, Anderson opined that the “petabyte age” and the rise of big data has started to undermine the very foundations of what is the dominant paradigm of scientific research today: with the massive sets of data that we have in our hands, the traditional scientific method of sampling, hypothesis testing, and using theory to ground causal explanations is losing out to the power of pure correlation.


“Correlation is enough.” , he declared. “We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.”

This is in many ways a very interesting idea to entertain in one’s head. More relevant however, is the tension in science and technology literature that this idea points towards. To understand it better let me bring up one of the more popular critiques of big data today: opacity.

While data evangelists have a tendency to wax lyrical about how larger and larger datasets and increasing access to more data makes the world more transparent (I chortle every time Mark Zuckerberg mentions “Facebook” and “open” in the same sentence, but that is besides the point), a lot of our behaviors today is governed by the ways in which our data is used - and we don’t quite know what these ways are. Don’t get me wrong here: I’m not necessarily saying that it’s a bad thing (or that it’s not a bad thing either); I’m merely making an observation of how our behavior has been adapted by and continues to be mediated through these data regimes.

Consider the simple case of Amazon recommendations. In the pre-Amazon age, retailers and advertisers just didn’t have the means to advertise products to us based on what they knew who we were. Sure, there were (and still are) strategically placed billboards that shout out their ever-persuasive messages to all and sundry, and yes, we often found vouchers and discount coupons slid under our door, but the fact that someone buying a certain book might be indicative of their interest in another book - was the  evidently obvious psychological goldmine that Amazon tapped into. Thus, when you add Watchmen to your cart, Amazon can say with reasonable accuracy that you’ll probably like V for Vendetta too - and thus shows it to you. You probably never had the intention of buying V for Vendetta (at least not at that moment), but now, upon seeing it right there in front of you, you think. The moment you take that second to think - leave alone actually adding it to your cart and buying it - is a win for Amazon. That is behavior that you wouldn’t have otherwise shown had you not been targeted by Amazon in this manner. If then, you go ahead and actually buy V for Vendetta - you’ve made a conscious decision to spend money, and add a new book to your library, because of the “hidden” ways in which Amazon used your data and advertised the book to you.

Let’s take a step back and assess the same scenario through the lens of Chris Anderson. Why do recommender systems work like they do? Decades of psychology, communication, and human behavior research can offer us a treasure trove of theories regarding why we tend to like similar things - but do any of those theories play any significant role in guiding Amazon’s recommender systems? No. Amazon’s algorithms can’t care less about why someone who liked Watchmen probably also likes V for Vendetta - as long as they just do. And how does Amazon know that they do? Because, data. That’s right, Amazon just knows that thousands of people who have bought Watchmen have also bought V for Vendetta, so pure Bayesian statistics dictates that a person buying the former has a high propensity towards buying the latter, whether they want it in the first place or not. As Steve Jobs famously said, “a lot of times, people don’t know what they want until you show it to them.” The fact that around 35% of Amazon’s sales comes through recommendations only proves how brutally effective it is (source). Moreover, even their brick-and-mortar store in Seattle has embraced their recommender algorithm - thus pointing at how the same “hidden ways” are setting a precedent of how bookstores of the future could look like.


The problem with these “hidden ways” is just that: they’re hidden, and we don’t know exactly how they work, except that they generally seem to. The Watchmen and V for Vendetta example might seem simple enough for any quantitatively minded person who knows Bayesian statistics to grasp, but then that explanation is merely empty speculation. Amazon doesn’t tell us why it recommended V for Vendetta. Surely that wasn’t the only book that others who bought Watchmen also bought? What was the threshold of sales that prompted them to recommend V for Vendetta and not say, The Killing Joke? Did they censor some recommendations? Did they artificially add a recommendation?  We will never know, and this matters because these infrastructures often spring surprises that raise several questions, many of which are of an ethical nature. Why, for example, did the Android Marketplace once show a “Sex-Offender Search” app as a recommendation for those who installed the gay dating app Grindr? Why, again, was it (artificially?) pulled down after that story went viral? How many such results are tweaked (maybe by the algorithm, or maybe even beyond and above the algorithm) to show us what we finally end up seeing? Consider Facebook. In an attempt to create a “safe online space” unlike Twitter, Facebook employs thousands of people whose sole job is to look at flagged posts and decide whether that’s appropriate for the public to view or not. While many of the rules that the moderators are made to abide by appear to be no-brainers (does a blanket ban on photos of naked children sound like a no-brainer to you? Think again.), some others raise interesting ethical questions. Why, for example, are Holocaust-denial posts allowed in certain countries only, when they aren’t allowed in general? How are these topics chosen? Who has the final say in what is allowed and what is not? All these instances beg the much larger questions: what is it that we really see, and how far apart is what we see from what is real?

The apparent neutrality of these platforms often misguides us, giving us a skewed sense of reality. What we really perceive as a consequence is the result of careful, relentless, algorithmic, and largely agenda-driven curation. This in turn seeps out of our computer screens, affecting in many ways how we behave and live in the real world. A particularly sinister example of how such agenda-driven curation can affect our real lives would be the infamous Facebook emotion experiment where researchers tweaked half a million of unsuspecting users’ Facebook timelines to find that the number of positive or negative posts in the timeline can effectively manipulate a user’s emotion. When the findings were published, the experiment stirred significant uproar. Privacy advocates slammed Facebook for playing with users’ emotions - raising potential concerns about the types of behavior that such an experiment could have caused. While Facebook had to eventually buckle under pressure and set up an in-house IRB, the experiment revealed just how whim-driven these infrastructures really are. All such instances - be it a sex-offender search app being suggested to Grindr users, or the censorship of a photograph of napalm girl, or experiments such as the one just described - are important because they let us peek into the inner functioning of these infrastructures. These help us pick and prod, and perhaps uncover a tiny bit of what is otherwise hidden before our eyes.

Spurious Correlations and Interpretability

Perhaps the biggest issue that platform opacity and approaches such as Chris Anderson’s raises is that of spurious correlations. We throw in massive amounts of data into our large computing clusters, without once pausing to ponder over what the sources of those datasets are. Our obedient algorithms then whip up correlations and predictions that then get used to drive commerce and industry around the modern world. It’s remarkable, and clearly (and largely) effective, but it makes us question a lot of things that we otherwise take for granted.

In 2016, a couple of researchers in China published a paper titled: “Automated Inference on Criminality using Face Images”. The paper boasted four different algorithms that were fed with large numbers of facial images - half of which belonged to convicted criminals. All four algorithms were able to predict whether a new face belonged to a criminal or not with impressive levels of accuracy.

Several questions were raised about this study, most notably, regarding the interpretability of the results. What was it really, that the algorithms were picking up? What was it about facial expressions that set apart “criminals” from “non-criminals”? Was it something to do with the nature of images instead - for example, lighting, shadows, and so on? Was it the colour of the shirt that they were wearing? If yes, is it ethically okay to label a person “criminal” because they were wearing a white shirt when their photograph was taken? Where, in fact, did the photos originate? Who collected these photos? What purpose was served by taking the photographs?

The authors published a response to the criticism, by pointing out that their sole reason to do the study was “... to push the envelope and extend the research on automated face recognition from the biometric dimension (e.g., determining the race, gender, age, facial expression, etc.) to the socio-psychological dimension.” and that they were “... merely interested in the distinct possibility of teaching machines to pass the Turing test on the task of duplicating humans in their first impressions(e.g., personality traits, mannerism, demeanor, etc.) of a stranger.” They pointed out that their algorithms did control for the nature of images, and that it also just segmented out the face from the photograph, so features like shirt colour, or the presence or absence of a collar could not have determined the results. What they didn’t tell us though, was how can we then interpret the results of the algorithms? How can we trust the algorithms to be fair and not unfairly discriminate against a certain type of people just because they don’t satisfy a set of criteria that the algorithms seek to be considered a non-criminal? What are the chances that the correlations between being a criminal and a particular facial feature is not just something that is arising out of sampling bias? Even if it wasn’t, how do we explain the correlations? At this point, we don’t know. And it’s highly likely that we never will.


Spurious Correlations is a very interesting website that gathers several examples of pairs of datasets that at first glance, don’t seem to be related in any conceivable way, but are surprisingly highly correlated over time. Take for example the strong correlation between the number of movies that Nicholas Cage has appeared in and the number of people who drowned by falling into a pool.

 Can we come up with one credible explanation regarding why the correlation could mean something? (No, movie goers getting increasingly depressed by Nicholas Cage’s recurring face, and thus deciding to commit suicide by jumping into a pool doesn’t qualify, I’m afraid) But, (and I’m trying to keep a straight face here, purely for the sake of objectivity), isn’t the correlation indicative of the fact that at a given time, knowing the number of movies that Nicholas Cage appeared in, is enough to make a roughly accurate estimate of the number of deaths caused by drowning in the US? Do we really need to know why they correlate as long as they just do? Of course, the ethical aspect kicks in when we decide to go overboard with these correlations - when we decide to outlaw Nicholas Cage from appearing in Hollywood films in order to cut down on the number of deaths by drowning, or when we decide to outlaw marriages in Kentucky in order to reduce the number of fishing boat accidents. But just for the sake of asking questions: let’s reframe our argument in the following manner. Are we as human beings even capable of interpreting certain correlations? Are we not limited by our very understanding of the social world to try and infer a plausible causal explanation that explains a certain correlation?

To push the case for interpretability further, let's take a look at a very popular machine learning algorithm that is used for classification purposes today: the Support Vector Machine or SVM. SVMs are powerful classifiers because they often use something called a kernel function to map data points to a higher dimension before proceeding to classify the data. The kernel function is a neat hack that helps classify data points easily which cannot be otherwise classified in their given dimension. The question of interest then is: how does one interpret the use of a kernel function? How does one really make sense of transforming a regular feature, say, a person’s age using a complex mathematical function, just because after transformation, the same age is supposedly a good predictor? Lastly, if they’re not interpretable, should we rely on such functions to classify people, predict behavior, and facilitate our understanding of social phenomena?


I’d now like to push back on these fears and make a case for when statistical results can be more revealing than what our understanding of the social world can offer.

Robert Moses, hailed as “the father of New York City’s modern urban planning” gave the Big Apple much of what makes it the shiny cosmopolitan capital of the free world today. However in doing so, he was motivated by his own prejudices in weaving enduring racism into much of how NYC looks like. Particularly notorious was his decision to keep dangerously low underpasses over the Long Island parkway to keep black New Yorkers from availing public transport in order to reach the de-facto white only beaches that he built. This example is very revealing because it serves as a very early instance of the “hidden ways” in which infrastructures have been shaping our behaviors for several decades now. To an alien observer of New York City, who is oblivious to the history and an unaware of the ground realities of racism in the United States, the high correlation between the height of underpasses and the segregation of neighbourhoods doesn’t make much sense. They’re likely to label it a “spurious correlation” and move on, but for objective observers of society and technology, it rings a bell. Surely, that isn’t a “spurious correlation”. It’s a spurious correlation only if one isn’t aware of the various hidden and abhorrent ways in which blacks have been marginalized in the US. But for those who are aware, then the causal mechanism is very clear and makes a lot of sense. Going back to my question regarding the interpretability of algorithms, the above example just shows the other side of wrongly dismissing useful insight on grounds of “spurious correlations”.


The ever-increasing integration of society and technology blurs our understanding of both, the exact nature of how data-driven infrastructures work, as well as the sheer limitation of social theories when faced with such a data deluge. It is time to perhaps pause and think what it is that we value. Do we really need to understand why some things work (and why others don’t), as long as they do? Do we need to rethink our understanding of social phenomena in light of what these hidden infrastructures reveal? On the other hand, do we really need to sacrifice what we’ve held dear for so long, and give in to the hidden ways in which data-rich regimes are shaping us?

The answer my friend, isn’t quite blowing in the wind.


Post a Comment