Posts Tagged ‘data’
Update II – Some Brief Reflections on Crowdsourcing Research (9/27/10)
After hosting this crowdsourcing site for a little more than three months, Jonathan and I have decided to shut down the experiment. The paper has been accepted at M4D 2010 in Uganda, so we want to stabilize the data and take this chance to provide some brief reflections on the experience.
We began this small research project curious about what we feel is the next stage in connectivity for the developing world: prepaid mobile Internet. Business models, amongst other subtle factors, will have influential roles in how this next stage unfolds. In exploring the availability of prepaid mobile Internet, we discovered that the African landscape is mottled, with different operators in different countries at various stages of offering the mobile Internet. But given the nature of the desk survey we conducted, we were unable to conclusively determine the offerings in every country, so we decided to open up the data collection to outsiders. The experience was a mixed one, but here are some thoughts:
Make Sure the Technology Fits the Project
Although we initially planned to use a publicly editable Google Map, we quickly switched to an Ushahidi instance because it was designed for mapping crowdsourced data. We found, though, that the initial purpose of a project may make it less flexible than two researchers with limited coding capability may hope. Ushahidi is great for mapping crises information, but that means its categorization and language used may make collecting information on mobile phone plans a bit confusing. For example, as is appropriate and understandable for crisis mapping, but confusing for our needs, is the hard-coded language of “Report an Incident” on Ushahidi. This slight barrier may make people less likely to participate.
The Importance of Personal Networks
During this experiment, we uncovered some great insights from individuals – far more nuanced than the simple “yes mobile Internet is prepay here” that we initially sought. In particular, the complicated nature of Africa mobile Internet was explained by Linda Raftree from her personal experience. Katrin Verclas, of Mobile Active, provided the initial introduction, but via email, not the platform we expected. These types of personal, pre-existing contacts are likely to be very helpful for efforts like this.
Unclear Nature of Selection Bias
Researchers take great pains to make sure they study replicable and fairly sampled groups. Because we were using personal networks and open crowdsourcing, we had to consider what, if any, bias this exploratory study had. Using the convenience of these tools, are there data being systematically excluded? Given that we had a well-defined population (African mobile network operators), this was less of a problem, but for researchers navigating new methodologies, it is worth considering. More specifically, Jonathan and I operated within the English-language blog and Twitter networks; did we miss Francophone Africa through crowdsourcing?
Summary
Putting the experience together, we might propose that crowdsourcing success is a function of (a) the ease of the task, (b) distribution of knowledge, (c) accessibility of the population with knowledge, (d) the willingness and capacity of researchers to drive the message and effort. If any of these fall short, the critical mass may never be reached.
Update – We are crowdsourcing the data on a website here.
Mobile Internet is going global and the newest chapter is taking place in developing countries. In order to better understand the paths towards widespread adoption of mobile Internet, Jonathan Donner and I have written a brief note on the availability of prepaid mobile Internet in Africa. Because prepaid models are more appropriate for poorer consumers, we argue that the availability of prepaid mobile data will be a key driver of inclusive mobile Internet usage.
Starting in late 2009, we collected data on the availability of prepaid mobile data in all 53 African countries. Unfortunately, without the budget to travel the continent, we have been unable to conclusively determine the presence of this form of connectivity in every country. So, we’re asking you to help build the database of prepaid mobile data in Africa.
Available here (in PDF) is a draft version of our paper where you can get a sense of our project.
For the crowdsourcing, we’ve created an editable Google Map with entries for each country. Green indicates existing knowledge that prepaid data is offered by at least one provider. Yellow means we have been unable to determine the presence of prepaid mobile data. And Red suggests confidence that it is not available in that country (though if you know otherwise, please do correct us!).
If you know for certain that prepaid data is available for mobile phones from at least one network provider in one of the countries marked Yellow, you can either get in touch directly or login with a Google account to the African Prepaid Mobile Data map and, in the upper right of the left-hand sidebar, click “Edit” to create your citation (the more supporting evidence, including links or your name/affiliation, the better).
This is a small-scale experiment in crowdsourcing data for use in an academic paper, so we’re not sure how much detail we will be able to gather, or what end-state the map will be, but we’re grateful for your help. Thanks!
Sometime last century, previously qualitative subjects were injected with hearty doses of empiricism. The advances from these new approaches swept across disciplines as diverse as finance to media studies. Today, quantitative grounding is considered a requisite for academic acceptability.
But what are the implications of this empiricism?
In his great history of academic economics and finance, The Myth of the Rational Market, Justin Fox follows how those disciplines became enamored with quantifying everything. Formulas, models, data, math. These were the approaches and tools taken seriously. In the process, phenomenon that weren’t quantifiable got tossed out – humans became rational actors, and it took monumental efforts by the behavioral economists to begin to re-imagine man in a more accurate, nuanced light. Yet, the damage was done; unintentionally, and not without adding great insights, but nonetheless done.
I wonder if the current trend towards huge data sets and massive computational power will have similar unintended consequences. There’s no lack of pessimists who think the Internet is ruining human society, but people like Jaron Lanier rarely hit the target and tend to be sensationalists trying to sell books. Of course, they’re up against plenty of Pollyanna’s who are selling this all as the greatest thing since sliced bread.
I think that whatever is happening, and whatever negatives there may be, is far less exciting than cover stories for The Atlantic or new business opportunities for Silicon Valley.
In a new special report on the data deluge, The Economist generally misses this theme. This isn’t to say the report isn’t quite good (it is) or that I would expect them to cover this (I don’t). However, one of the articles does get close when examining how to handle the extraordinary amount of information and the infrastructure needed to deal with it:
The cornucopia of data now available is a resource, similar to other resources in the world and even to technology itself. On their own, resources and technologies are neither good nor bad; it depends on how they are used. In the age of big data, computers will be monitoring more things, making more decisions and even automatically improving their own processes—and man will be left with the same challenges he has always faced. As T.S. Eliot asked: “Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?”
Two months ago, I would have agreed that “technologies are neither good nor bad,” but a course on infrastructure studies has definitely made me question that. Eliot’s quote, though, is the right question to be asking.
Update: I forgot to include another relevant bit:
Processing data is another concern. Ian Ayres, an economist and lawyer at Yale University and the author of “Super-Crunchers”, a book about computer algorithms replacing human intuition, frets about the legal implications of using statistical correlations. Rebecca Goldin, a mathematician at George Mason University, goes further: she worries about the “ethics of super-crunching”. For example, racial discrimination against an applicant for a bank loan is illegal. But what if a computer model factors in the educational level of the applicant’s mother, which in America is strongly correlated with race? And what if computers, just as they can predict an individual’s susceptibility to a disease from other bits of information, can predict his predisposition to committing a crime?
Update 2: I have previously written a bit more extensively about my reservations about quantification here.
I remember the first time I saw Hans Rosling present his Gapminder project. In a TED video he wowed the crowd by bringing boring tables of statistics to life through stunning animations. His innovative presentation of data dispelled myths and misconceptions. By making statistics something we could visualize, Rosling showed me the power of a good visualization.
[youtube=http://www.youtube.com/watch?v=hVimVzgtD6w]
That is why a new project called GeoCommons is so exciting. GeoCommons is the consumer product that includes “Finder!” and “Maker!”. They allow, as you might expect, anyone to find or make stunning maps of geo-coded data. Data sets are easily exportable to mapping services like Google Earth or Microsoft Virtual Earth. When they unveil Maker! in the coming weeks, expect it to do for geo-visualization what the Google Maps API did for geo-mashups. Take, for example, the map below.
According to TechCrunch who covered GeoCommons today, the orange circles represent carbon emmissions while the darker shaded regions show heavier population densities. Because of this map the amorphous issue of air quality in China is reified. The possibilities are endless and hundreds are already available for examination.
I am incredibly excited to see what geographical data people are able to make concrete. Data is only as good as it is understandable and tools like GeoCommons and Gapminder make data understandable at a glance. They reveal the truth more than a spreadsheet ever could.

