Here's some notes about parallel computing, going in two directions at the same time. In one direction you have tools like Map/Reduce or Hadoop managing the work of parceling a problem into many pieces and then coordinating the collection of those pieces. In another direction you have graphics chips being turned into interesting general purpose parallel computing chips that can do some specific operations very fast.
It's mostly in the form of clippings, in part because I don't have a whole story yet, just the fragments.
from P16: Practical Progress - Supercomputing for the masses
One thing CUDA doesn't provide is a way to manage and process massive datasets. The cards have somewhat limited memory, and you have to write an app that runs on the host and feeds the card with data. For search applications like topic clustering -- which I'd like to use this for -- CUDA alone doesn't provide an answer.
Perhaps it would make sense eventually to use Hadoop plus CUDA -- write your map/reduce tasks in CUDA, and rely on Hadoop to distribute data around a cluster of Nvidia-accelerated boxes?
find23.net - Map Reduce on GPUs
A fellow at the german hadoop user meeting (Thanks to Isabel that
organized that again) pointed me to the fact that GPUs on a graphic
cards basically working like server grids.
He mentioned there are some research papers in this field. I spend some
time to read through what I could found and it was quite interesting.
Let me citate some of the facts from the two most interesting papers:
+ “A Map Reduce Framework for Programming Graphics Processors” by
Bryan Catanzaro, Narayanan Sundaram and Kurt Keutzer UC, Berkeley
+ “Mars: A MapReduce Framework on Graphics Processors” by Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govindaraju, Tuyong Wang
and, hiding at the end of a long set of searches, is Cloudera, a company
Hadoop is the popular open source implementation of MapReduce, a powerful tool designed for the detailed analysis and transformation of very large data sets. Hadoop enables you to explore complex data in its native form, using custom analyses tailored to the information and questions you have.
Cloudera can help you install, configure and run Hadoop for large-scale data processing and analysis.
The vision is not a rack of ordinary CPUs using Hadoop to manage them; the vision is a rack of CPU+GPU combinations where you can take advantage of parallelization both between machines and on the machine.
