Sunday, May 6, 2012

MapReduce & CouchDB

Just a rhetoric post.

Some time ago seeking of more information about NoSQL, I've come across Apache CouchDB, and found it more interesting for myself than MongoDB.

First, I felt a lack of permission settings for read operation, see answered question on Stack Overflow. But I went further, and created simple TODO list, and later more useful, but still simple notebook - Li.Couch, just to get more experience in NoSQL and Knockout, oh, and to use it for myself also.

Now, I met closely MapReduce, trying to implement simple search for the notebook. But I felt lack of better implementation of Reduce in CouchDB. "A common mistake new CouchDB users make is attempting to construct complex aggregate values with a reduce function. Full reductions should result in a scalar value, like 5, and not, for instance, a JSON hash with a set of unique keys and the count of each" [CouchDB The Definitive Guide: Finding Your Data with Views]. And I made this mistake.

I can't find "original" idea of MapReduce from Google, but if WikiPedia can be considered as source of the truth, reduce part of MapReduce is not limited to scalar result: MapReduce.

I also found very nice paper Experiments with MapReduce in Erlang, that demonstrates what I would like to have in CouchDB:

map: (K1, V1) → List[(K2, V2)] reduce: (K2, List[V2]) → List[V2] mapreduce: (List[(K1, V1)],map,reduce) → Map[K2, List[V2]] 
Why CouchDB does not allow to emit data in reduce function similarly to map to obtain reduced list?