dumbo icon indicating copy to clipboard operation
dumbo copied to clipboard

Custom Input File Formats

Open sv2000 opened this issue 10 years ago • 0 comments

Does Dumbo support custom input file formats e.g. WholeFileInputFormat.class which treats the entire file contents as a single record? I compiled WholeFileInputFormat.java (from Hadoop: The Definitive Guide) and created a custom streaming jar with WholeFileInputFormat.class along with the other class files in hadoop-streaming.jar. I then run the wordcount.py example in dumbo with the -inputformat option to be WholeFileInputFormat, but I am hit with the following error: "java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 255"

Is there some more work that needs to be done to get custom input formats working in Dumbo?

sv2000 avatar May 26 '14 03:05 sv2000