dumbo
dumbo copied to clipboard
Set reducer‘s numbers failed
I am using Hadoop streaming with -io typedbytes and set mapred.reduce.tasks=2, but I finally got only one output file. And if I set mapred.reduce.tasks=0, then I got many output files. I am very confused.
SO my question is: How to make mapred.reduce.tasks = num (num >1) config valid when I using -io typedbytes in streaming?
PS: my mapper's output is (key:string of python, value:array of numpy) .
And my .sh file:
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.2.1.jar
-D mapred.reduce.tasks=2
-fs local
-jt local
-io typedbytes
-inputformat org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat
-input FFT_SequenceFile
-output pinvoutput
-mapper 'pinvmap.py'
-file pinvmap.py