big data and the geoweb

I’m taking a moment before 4th of July festivities to think about one of the ‘hottest’ topics in geoweb studies right now - the ‘big data’ movement. To be sure, the interest/fascination, institutional frameworks, and social technology capabilities have been aligning over the last few years (so it’s not terribly new), and its impacts are *much *broader than the geoweb. However, in some ways geoweb scholars are just now beginning to engage with this concept, and the concept is now having strong implications in our studies.

In case this term is new to you, “big data” (from here on I’ll drop the quotation marks) is basically the name given to the incredibly high amounts of data generated through social networking technologies. Think: Twitter posts (tweets), FourSquare & Facebook check-ins, and many other digital spaces that are used to produce data. I’m not sure where the term comes from originally, but it made its first big splash in the social sciences when an editor of Wired famously (infamously?) proclaimed that big data is challenging the utility of the scientific method.

Although the editorial has faced its fair share of trenchant critique from geographers as well as our academic “neighbors”, the big data concept is still having interesting effects in geoweb studies. First, the biggest one, I think, is that small-scale empirical research is now oftentimes defined in relation to big data. Geoweb scholars are increasingly using the term “small data” to refer to these studies; in other words, these are situated by their “Otherness” from big data. This could have longstanding impacts on how we think about the geoweb and the analytical frameworks we employ to theorize it. Second, some geoweb studies restrict the kinds of questions asked to the questions answerable through big data. Sometimes this manifests as analysis at a descriptive level: “who says what where?”, while other times it might imply broader representation than is reasonable: “we use a given corpus of geoweb data to understand society writ large”. Third, big data does a lot of work toward re-normalizing several concepts problematized by feminist, post-colonial, and post-structuralist geographers: accuracy, correlation/causation, democratic participation, and quantification, for instance. This is not to say that these terms are meaningless or somehow “wrong”, but rather that they are political and can take many different forms and connotations. Sparke [2] once showed, for instance, how a native American map of a river was “inaccurate” because it represented a segment of the river as wider than it “really” was; this de-politicized interpretation failed to consider the multiple ways accuracy is conceptualized, for instance, the idea that perhaps the river was wider because it takes longer to cross at that particular point.

I’ve said in conversation before that when we can answer all of our research questions with big data, we’ll know that we’ve stopped asking the right questions. I can think of several ways to justify this. First, it’s only a small subsection of the global population that produces the data most often associated with big data: there’s a large global North - global South division, but also divisions within these broad categories (in order to disrupt simple binaries I call some of this the global South within the global North): more privileged places “produce” more data producers and tend to get mapped more thoroughly, men produce more than women, mega-tweeters (so to speak) produce legions more data than most other users, people with smart phones produce more than others, etc (Muki Haklay has blogged about this). So, how broadly you can speak to things like “culture”, “reality” taken in a holistic rather than situated sense, and “democratic” practice need to be tempered by this acknowledgement. Second, the big meaty social science-y kinds of questions are asked after we get the descriptive stuff out of the way. Once patterns - distributions, clusters, correlations, statistical trends - are detected, we must then ask “why?” and situate these observations in relation to extant theories and explanations. Lest this be mistaken as the scientific method, I should be clear that this can just as easily be done in a critical, reflexive, theory-driven manner. Third, and perhaps most importantly, ethnography, interviews, and other qualitative methods produce a much different kind of data that should not be undermined. These are not necessarily opposed to big data or quantification, but can give us a very rich, contextualized understanding of a phenomenon that is often lost in big data. For instance, I’ve been intrigued by one of Joe Eckert’s ideas that he mentioned casually to me, the idea of doing an ethnography of Twitter users, or interviewing the people who lead group pages on Facebook. In other words, by acknowledging that more data lies beneath the “big data”, we can see both the utility as well as the limitations of big data.

In summary, the proliferation of big data is a good thing for the limited sets of questions that can be answered with it. Right now there’s a lot of hype around it, and hopefully this post encourages us to slow down and critically reflect on its affordances as well as its limitations. Questioning its origins, effects, and associations helps us get there.

[1] Although entitled “The End of Theory…”, the piece is more about the hypothetic-deductive model of science, in his words, “hypothesize, model, test”, not theory in the ways we use the term in critical geography. The editor made the mistake of equating modeling with theorizing partly to highlight the epistemological limitations of theorizing, but this is done much better by those in the History and Philosophy of Science, and in Science & Technology Studies.
[2] Sparke, Matthew. 1998. “A map that roared and an original atlas: Canada, cartography, and the narration of nation”. Annals of the AAG, 88(3): 463-495.