The Importance of Geographic Specificity in Research

I defended my dissertation a couple weeks ago, and in the defense my committee raised some concerns that had already crossed my mind. (Before defenses of all sorts the candidate entertains every possible - even unlikely - scenario, question, and debacle that could occur.) My doctoral research was largely an international study looking at broad enterprises – humanitarianism and its digital “counterpart”, meaning I didn’t investigate the form digital humanitarianism takes in specific cases, places, and crises. It was, rather, an exploration of the large-scale shifts of humanitarianism writ large. My committee members asked how my study results/outcomes might have shifted if I’d taken a case to study, or looked at its impacts in a particular location. I answered the best I could, by drawing on dominant approaches to knowledge production: we don’t even really know what “digital humanitarianism” is yet, and even less do we understand its origins and social/political impacts, so my study begins to lay some of that groundwork. Very productive future work could look at its variegated manifestations in specific cases, much of which very well exist in contradictory relationship to my study results. However, despite the fact that my dissertation provides this valuable foundation, it is geographically-ambiguous: it would probably look very different in the Kensington neighborhood of Philadelphia than in, say, lower Manhattan, than in Jakarta.

The potential “problem” my committee recognized is often reflected in my GIS students’ work, taking in one form the “Modifiable Areal Unit Problem”. MAUP results when you have a collection of discrete data points - individual addresses and incomes, for instance, or observations of crime - and aggregate the individual points into regions like Census tracts, political districts, states, or neighborhoods. The “problem” is the fact that the results of your analysis will shift depending on which region/scale you choose. For example, aggregating crime data into Census tracts may result in very different geographic patterns than aggregating into neighborhoods or citie boundaries. It’s very important as researchers not just to acknowledge this problem and the fact that most regions are chosen ambiguously, but to actually choose these regions and communicate your rationale wisely and carefully. In some cases it may even be necessary to run analyses at multiple scales in order to gauge the degree to which your primary analysis is limited in its applicability.

This long build-up finally arrives at the purpose of this blog post. This summer I am working as a fellow in the “Data Science for Social Good” program of the eScience Institute at UW. The project I’m working on is asking whether and how social media data can be used to measure and improve the well-being of urban communities. This involves taking social media from multiple sources - Instagram, Zillow, Twitter, and FourSquare, among others - and trying to evaluate how “well” a community is doing as reflected in these data sources. Some early research suggests this is possible, but more tests are certainly needed, especially with regard to the specific kinds of meaningful information researchers might be able to glean from these sources.

But we’re running into a signficant challenge in the early stage: the need for geographic specificity. I mean two things when I say that. First, there’s a question of the on-the-ground geometries that should be used for this analysis - the spatial units, in other words. I wonder if the results of our research would turn out differently depending on which neighborhoods we look at. That is, are social media data very useful for gauging well-being in some neighborhoods but meaningless in others? Do some dimensions of well-being (as defined in the literature, such as safety, thriving local businesses, etc.) come out strongly in social media in some places, but not others? Would it be more fruitful to look at the social media within neighborhood boundaries, or blocks, or certain streets, or clusters of populations within neighborhoods? These are all questions about what geographers call “absolute geographic space” - the on-the-ground geographies that you see in most maps.

Second, there’s a question of the social dimensions of the questions we’re asking. Are we actually looking at communities, or neighborhoods? Is there a reason to assume there’s a relation between these two very different concepts? And what is that relationship? This matters because if we use neighborhood boundaries to assess community well-being, we are suggesting very specific linkages between the two, both in terms of how we see the world, and in terms of how we say a researcher can produce knowledge about one or the other. Then there are other social constructs we may actually be looking at, that are neither of the other two: social and interpersonal networks, engagement with the (or, a) public sphere, individuals’ commitments (or lack thereof) to geographic spaces (like, “Why should I care about the neighborhood I live in, if I plan to move in a year or two?”). Even further, there are very compelling disagreements among social scientists about the benefits of online communities and political organization. Are social networks and online communities strong ties? Weak ties? Does it matter for social and political movements?

Depending on your intellectual and political commitments you may be more or less comfortable with my assertion that there are no easy answers to these provocations. There may not be answers at all, just tentative claims that require significant justification and evidence. Regardless, this kind of work depends fundamentally on specificity in language and geography. The questions I’ve posed exemplify the reasons such concepts and terms cannot be used interchangeably. They also show why it’s crucial for researchers to be specific in the terms and geographies they use to produce knowledge about the world. Researchers can’t simply use terms, ideas, and concepts without them being clearly defined – and bonus points if there are some empirics to back up the way you use that language.