dumbo icon indicating copy to clipboard operation
dumbo copied to clipboard

Python module that allows one to easily write and run Hadoop programs.

Results 29 dumbo issues
Sort by recently updated
recently updated
newest added

Whenever a jumbo job is run, this warning appears: `11/07/07 13:23:25 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.`

bug

python setup.py install, was giving following error: Searching for typedbytes Reading https://pypi.python.org/simple/typedbytes/ No local packages or download links found for typedbytes error: Could not find suitable distribution for Requirement.parse('typedbytes') So...

Wiki and project links updated. issue #90

I tried links from README file and both of them seams to be dead

I tried to run with the "-fake yes" option but the job got launched never the less. I was using dumbo.Job and looking at the code I don't see where...

I am using Hadoop streaming with -io typedbytes and set mapred.reduce.tasks=2, but I finally got only one output file. And if I set mapred.reduce.tasks=0, then I got many output files....

I wrote this backend to enable local dumbo jobs to leverage multiple processor cores. Minimal usage example, which will run 4 mappers in parallel and then run 4 reducers: dumbo...

Now to get source path from the mapper routine just add **kwargs to the arguments list. Here are some examples. ``` @dumbo.decor.primary def map_primary(key, value, **kwargs): key, value = value.strip().split('\t')...