dumbo
dumbo copied to clipboard
Permit stdout redirection to avoid broken pipes
One of the frustrating problems I've been running into is that if I have "print statements" in code called by my mapper/reducer this will break the pipe used by my streaming job.
It seems like a simple change to dumbo can fix this. In core.py change typedbytes.PairedOutput(sys.stdout).writes(outputs)
to typedbytes.PairedOutput(sys.stdout).writes(outputs)
This way all we have to do is redirect stdout to stderr and extraneous print statements will no longer cause problems.
I've tried this out and it seems to work for me.
I apologize for the cross post but this is how I fixed this problem in Hadoopy http://bwhite.github.com/hadoopy/#pipe-hopping-using-stdout-stderr-in-hadoopy-jobs