dumbo
dumbo copied to clipboard
" -file option is deprecated, please use generic option -files instead."
Hello! I am trying to run a job for our data team and we are getting errors using dumbo. We are using the latest version of Dumbo and Cloudera.
Command used to run the job:
"ls[benjamin@arya dedup]$ dumbo start jaccard.py -input products -output products-output13 -hadoop /usr/ -hadooplib /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/"
Stacktrace:
13/10/30 13:05:32 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead. 13/10/30 13:05:32 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead. packageJobJar: [/home/benjamin/mapreduce/jobs/dedup/typedbytes.pyc, /home/benjamin/mapreduce/jobs/dedup/jaccard.py, /home/benjamin/mapreduce/jobs/dedup/dumbo/backends/common.pyc] [] /tmp/streamjob5478521893861821465.jar tmpDir=null 13/10/30 13:05:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/10/30 13:05:34 INFO mapred.FileInputFormat: Total input paths to process : 1 13/10/30 13:05:35 INFO mapred.JobClient: Running job: job_201310231818_0015 13/10/30 13:05:36 INFO mapred.JobClient: map 0% reduce 0% 13/10/30 13:05:47 INFO mapred.JobClient: Task Id : attempt_201310231818_0015_m_000000_0, Status : FAILED java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.AutoInputFormat not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1649) at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:620) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:394) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.AutoInputFormat not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1617) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1641)
Any help would be greatly appreciated!
Seams like hadoop-streaming_.jar is missing on your nodes. Check if your environtment points to the correct HADDOP__ paths or try to add hadoop-streaming*.jar with -libjar option.