datafu
datafu copied to clipboard
Hadoop library for large-scale data processing, now an Apache Incubator project
A couple of different configuration options are available for calculating ndcg. You can specify positional values per range, use a standard logarithmic discounting function, or use a custom function.
Thanks for the fix to SampleByKey issue. Please let us know when can we expect the release that contains this fix. Or If the build instructions are documented somewhere I...
It would be great to be able to get back a long instead of a GUID. Ended up writing my own UDF to do this :/
I have a pair of 35M of links from 117K nodes and ran pagerank job on 3 node m2.2xlarge EMR cluster. Initially I got out of memory error in the...