Lars Marius Garshol
Lars Marius Garshol
_From [[email protected]](https://code.google.com/u/106380900043315593284/) on November 04, 2011 03:15:28_ **Labels:** Component-Core
_From [[email protected]](https://code.google.com/u/117666216113858450647/) on June 05, 2013 08:45:47_ Any idea on how you would map the functionality to the map/reduce programming model ? Other than this I can see a big...
_From [[email protected]](https://code.google.com/u/106380900043315593284/) on June 05, 2013 08:52:23_ Basically, what you'd have to do is to use a blocking scheme. That is, create a key from each record such that similar...
No, not at the moment. It should be pretty easy to do, though. Mainly I need a data set big enough to require MapReduce. Without that there's not much point...
For smaller dedup tasks: no. I have seen a paper that claims it doesn't scale so well with M/R, but I was deeply unconvinced by that paper. Having said that,...
No problem there. Just use the blocking functions already used by the MapDB and other blocking backends. Then the map step is record -> blocking key, and finally the reduce...
Now we have a user being confused because he's linking two files, the second of which has no IDs, and the link file winds up empty. Fix this at the...
Commit dda6390 fixes the first issue.
_From [[email protected]](https://code.google.com/u/106380900043315593284/) on April 03, 2013 11:44:08_ Yeah, this is a good point. I'll try to explain why we are where we are now. (1) I hate dependencies. I really...
_From [[email protected]](https://code.google.com/u/101965401751673942722/) on April 12, 2013 03:55:48_ Hi Lars, I'll take a look at the logger interface and try to give you my opinion...