Monday, December 17, 2007

Using the internet as a broker between data processing projects and volunteer resources

Consider this situation. An institution has some worthwhile project they want to undertake, but it has huge data-processing requirements, and they just don't have the resources to fulfill them. Maybe they'd like to simulate protein folding, for medical research.

In an increasing number of cases like this, the internet is being used as a broker to enable such projects, by providing a pool of volunteer resources, either of people's computers or their own time to perform manual data-processing. There's huge pools of resources out there - a whole world of computers in homes and potential volunteers.

So far, this strategy has been quite successful, as reported in this article in the Economist.

Here's a summary of the details

Automated data-processing

Using the spare processing cycles on people’s home computers (and other devices, like Playstation 3’s).

Folding@home - simulating protein folding and mis-folding -- a cause of diseases such as Alzheimer's.
In September, had combined computing capacity one petaflop--a quadrillion mathematical operations per second--something supercomputer designers have dreamed of for several years.

SETI@home - analysing data for signs of the existence of extra-terrestrial civilisations
The BOINC platform has been developed to support such processing.

Manual data-processing ("distributed thinking")
Galaxy Zoo - volunteers help astronomers to classify the shapes of galaxies from powerful telescope images. "Thanks to the exquisite pattern-recognition capabilities of the human brain, amateurs with just a little training can distinguish between different types of galaxy far more efficiently than computers can."
Had more than 100,000 volunteers classify over 1 million galaxies in a few months.

Stardust@home - volunteers spot the tell-tale tracks left by microscopic interstellar dust grains in tiles of porous aerogel from a probe sent into space.
Enlisted some 24,000 volunteers, who in less than a year performed more than 40 million searches (about 1500 searches / person).

Herbaria@home - volunteers document plant specimens from images drawn from the dusty 19th-century archives of British collections. Already, some 12 000 specimens have been documented.

Africa@home - volunteers extract useful cartographic information -- the positions of roads, villages, fields and so on -- from satellite images of regions in Africa where maps either do not exist or are hopelessly out of date. This will help regional planning authorities, aid workers and scientists
documenting the effects of climate change.

Distributed Proofreaders - volunteers help to proofread OCR'd scans of pages from old books (not metioned in the Economist article, but in a referring slashdot article).
The BOSSA platorm (Berkeley Open System for Skill Aggregation) has been developed to support such processing.

What if the non-expert volunteers do a poor job at the dataprocessing? This isn't actually a problem, as redundancy is used to ensure quality. For example, a particular image used by Galaxy Zoo is classified by thirty different people, and it turns out that this is enough to get a highly accurate answer.

Some thoughts

Some ideas not touched on in the Economist article.

Will there always be a need for such volunteer resources? Or might Moores Law make computing power so cheap and abundant that even the most processing hungry project easily satisfy its own needs?

Second, I wonder if there might be a time in the future when the idea of such projects is well known in the public mind, and where people would think of them like they would think now of volunteering for a community group or giving money to a charity?

Lastly, the manual data processing projects might be a good way for school children, or members of the general public, to gain a gentle introduction to the world of science, and learn a bit more about how science is done and how the scientific community works? I'm not saying that volunteering on such projects is an education in these things, but it might still provide a little bit of a feel and familiarity.

No comments:

Post a Comment