dumbo
dumbo copied to clipboard
Custom Input File Formats
Does Dumbo support custom input file formats e.g. WholeFileInputFormat.class which treats the entire file contents as a single record? I compiled WholeFileInputFormat.java (from Hadoop: The Definitive Guide) and created a custom streaming jar with WholeFileInputFormat.class along with the other class files in hadoop-streaming.jar. I then run the wordcount.py example in dumbo with the -inputformat option to be WholeFileInputFormat, but I am hit with the following error: "java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 255"
Is there some more work that needs to be done to get custom input formats working in Dumbo?