dumbo
dumbo copied to clipboard
Python module that allows one to easily write and run Hadoop programs.
In some cases it could be useful to store all commandline -param args in global variable like dumbo.params. Yes, I know that self.params provides such functionality. But if I want...
Currently MultiMapper has no access to filepaths of the input lines. This is because current implementation of MultiMapper.**call***key functions use filepaths to distribute input lines between (sub)mappers and then implicitly...
I was running a job that outputted to 'twoo/flowanalysis/2012/09/*', but this gives issues because when dumbo runs the hdfs (re)move operations (on overwrite="yes" for instance), it doesn't escape it properly...
[zhouhh@Hadoop48 examples]$ dumbo start wordcount.py -hadoop /home/zhouhh/hadoop-1.0.3 -input input1 -output output1 zhh parse argv: ['/usr/local/bin/dumbo', 'start', 'wordcount.py', '-hadoop', '/home/zhouhh/hadoop-1.0.3', '-input', 'input1', '-output', 'output1'] zhh sysargv: ['wordcount.py', '-prog', 'wordcount.py', '-input', 'input1',...
Hey, I really love the job management stuff in dumbo. However, it seems like the inner-core of hadoopy is more highly optimized. (I get a factor of 2 better performance...
(Used to be "parser attribute on a single mapper gets applied to others in same MultiMapper".)
One of the frustrating problems I've been running into is that if I have "print statements" in code called by my mapper/reducer this will break the pipe used by my...
_As originally [reported](http://dumbo.assembla.com/spaces/dumbo/tickets/61) by Elias Pampalk:_ The following scripts demonstrate a failure to fail when executed on a hadoop cluster (fails fine if executed locally): ``` import dumbo def mapper(k,...
See the traceback from the logs below. Traceback (most recent call last): File "/usr/lib/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.6/runpy.py", line 34, in _run_code exec code...