mrjob
mrjob copied to clipboard
mrjob error due to unrecognized option : -ex
I am running on Windows 10 using the latest mrjob version on conda-forge. I am using Hadoop 2.8.0. I have a problem of running ReduceMap job from Mrjob using the hadoop server.
trial.py
from mrjob.job import MRJob
from mrjob.step import MRStep
import re
WORD_RE = re.compile(r"[\w']+")
class Balance(MRJob):
def mapper(self, _, line):
teks = line.split(" ")
nama = teks[0]
value = int(teks[3])
if teks[1] in ["withdraw", "transfer-out"]:
value *= -1
yield (nama, value)
def combiner(self, word, counts):
yield (word, sum(counts))
def reducer(self, word, counts):
yield word, sum(counts)
if __name__ == '__main__':
Balance.run()
Trial.py works by running using mrjob built in MapReduce with datacase from home directory. ie python trial.py input.in
trialrunner.py
from trial import Balance
mr_job = Balance(args=['hdfs:///usr/local/testcase.txt', '-v', '-r', 'hadoop', '--hadoop-streaming-jar=hadoop-streaming-2.8.0.jar'])
#mr_job = Balance(args=['testcase.txt'])
with mr_job.make_runner() as runner:
runner.run()
for line in runner.stream_output():
key, value = mr_job.parse_output_line(line)
I have verified that the testcase.txt is inside the hdfs directory. the hadoop-streaming-2.8.0.jar is in the same directory with the trialrunner.py. I have both hadoop-streaming-2.8.0 jar in both the hadoop directory and local directory but MrJob failed to detect them so I have to add it manually.
However, running the file is unsuccessful
(base) D:\gdp>python trialrunner.py
No configs specified for hadoop runner
Can't fetch history log; missing job ID
No counters found
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Traceback (most recent call last):
File "trialrunner.py", line 8, in <module>
runner.run()
File "C:\Users\User\Anaconda3\lib\site-packages\mrjob\runner.py", line 510, in run
self._run()
File "C:\Users\User\Anaconda3\lib\site-packages\mrjob\hadoop.py", line 355, in _run
self._run_job_in_hadoop()
File "C:\Users\User\Anaconda3\lib\site-packages\mrjob\hadoop.py", line 485, in _run_job_in_hadoop
num_steps=self._num_steps())
mrjob.step.StepFailedException: Step 1 of 1 failed: Command '['C:\\hadoop-2.8.0\\bin\\hadoop.CMD', 'jar', 'hadoop-streaming-2.8.0.jar', '-files', 'hdfs:///user/User/tmp/mrjob/trial.User.20190529.040926.484502/files/mrjob.zip#mrjob.zip,hdfs:///user/User/tmp/mrjob/trial.User.20190529.040926.484502/files/setup-wrapper.sh#setup-wrapper.sh,hdfs:///user/User/tmp/mrjob/trial.User.20190529.040926.484502/files/trial.py#trial.py', '-input', 'hdfs:///usr/local/testcase.txt', '-output', 'hdfs:///user/User/tmp/mrjob/trial.User.20190529.040926.484502/output', '-mapper', '/bin/sh -ex setup-wrapper.sh python3 trial.py --step-num=0 --mapper', '-combiner', '/bin/sh -ex setup-wrapper.sh python3 trial.py --step-num=0 --combiner', '-reducer', '/bin/sh -ex setup-wrapper.sh python3 trial.py --step-num=0 --reducer']' returned non-zero exit status 1.
I decided to manually trigger the command above and I got the following error.
(base) D:\gdp>C:\\hadoop-2.8.0\\bin\\hadoop.CMD jar hadoop-streaming-2.8.0.jar -files hdfs:///user/User/tmp/mrjob/trial.User.20190528.113822.387782/files/mrjob.zip#mrjob.zip,hdfs:///user/User/tmp/mrjob/trial.User.20190528.113822.387782/files/setup-wrapper.sh#setup-wrapper.sh,hdfs:///user/User/tmp/mrjob/trial.User.20190528.113822.387782/files/trial.py#trial.py -input hdfs:///usr/local/testcase.txt -output hdfs:///user/User/tmp/mrjob/trial.User.20190528.113822.387782/output -mapper /bin/sh -ex setup-wrapper.sh python3 trial.py --step-num=0 --mapper -combiner /bin/sh -ex setup-wrapper.sh python3 trial.py --step-num=0 --combiner -reducer /bin/sh -ex setup-wrapper.sh python3 trial.py --step-num=0 --reducer
<bunch of warning of illegal factoring>
19/05/29 11:11:33 ERROR streaming.StreamJob: Unrecognized option: -ex
Usage: $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar [options]
Options:
-input <path> DFS input file(s) for the Map step.
-output <path> DFS output directory for the Reduce step.
-mapper <cmd|JavaClassName> Optional. Command to be run as mapper.
-combiner <cmd|JavaClassName> Optional. Command to be run as combiner.
-reducer <cmd|JavaClassName> Optional. Command to be run as reducer.
-file <file> Optional. File/dir to be shipped in the Job jar file.
Deprecated. Use generic option "-files" instead.
-inputformat <TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName>
Optional. The input format class.
-outputformat <TextOutputFormat(default)|JavaClassName>
Optional. The output format class.
-partitioner <JavaClassName> Optional. The partitioner class.
-numReduceTasks <num> Optional. Number of reduce tasks.
-inputreader <spec> Optional. Input recordreader spec.
-cmdenv <n>=<v> Optional. Pass env.var to streaming commands.
-mapdebug <cmd> Optional. To run this script when a map task fails.
-reducedebug <cmd> Optional. To run this script when a reduce task fails.
-io <identifier> Optional. Format to use for input to and output
from mapper/reducer commands
-lazyOutput Optional. Lazily create Output.
-background Optional. Submit the job and don't wait till it completes.
-verbose Optional. Print verbose output.
-info Optional. Print detailed usage.
-help Optional. Print help message.
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
command [genericOptions] [commandOptions]
For more details about these options:
Use $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar -info
Try -help for more information
Streaming Command Failed!
I forgot to mention that I downloaded the jar from mvnrepository.com, if it helps.
I just tested with streaming jar 3.1.2 from Apache and Maven on Ubuntu. Both returns the same error.
I have the same problem with Mrjob and Hadoop 2.8.0. the program works fine locally (on Python), but raises the following error when running on Hadoop. PLEASE HELP!
""" C:\hadoop\bin>python C:/MapReduce.py -r hadoop --hadoop-streaming-jar C:/hadoop-streaming.jar C:/words.txt No configs found; falling back on auto-configuration No configs specified for hadoop runner Looking for hadoop binary in C:\hadoop\bin\bin... Found hadoop binary: .\hadoop.CMD Using Hadoop version 2.8.0 Creating temp directory C:\Users\I\AppData\Local\Temp\MapReduce.I.20220226.181157.856795 uploading working dir files to hdfs:///user/I/tmp/mrjob/MapReduce.I.20220226.181157.856795/files/wd... Copying other local files to hdfs:///user/I/tmp/mrjob/MapReduce.I.20220226.181157.856795/files/ Running step 1 of 1... Found 2 unexpected arguments on the command line [hdfs:///user/I/tmp/mrjob/MapReduce.I.20220226.181157.856795/files/wd/mrjob.zip#mrjob.zip, hdfs:///user/I/tmp/mrjob/MapReduce.I.20220226.181157.856795/files/wd/setup-wrapper.sh#setup-wrapper.sh] Try -help for more information Streaming Command Failed! Attempting to fetch counters from logs... Can't fetch history log; missing job ID No counters found Scanning logs for probable cause of failure... Can't fetch history log; missing job ID Can't fetch task logs; missing application ID Step 1 of 1 failed: Command '['.\hadoop.CMD', 'jar', 'C:/hadoop-streaming.jar', '-files', 'hdfs:///user/I/tmp/mrjob/MapReduce.I.20220226.181157.856795/files/wd/MapReduce.py#MapReduce.py,hdfs:///user/I/tmp/mrjob/MapReduce.I.20220226.181157.856795/files/wd/mrjob.zip#mrjob.zip,hdfs:///user/I/tmp/mrjob/MapReduce.I.20220226.181157.856795/files/wd/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/I/tmp/mrjob/MapReduce.I.20220226.181157.856795/files/words.txt', '-output', 'hdfs:///user/I/tmp/mrjob/MapReduce.I.20220226.181157.856795/output', '-mapper', '/bin/sh -ex setup-wrapper.sh python3 MapReduce.py --step-num=0 --mapper', '-reducer', '/bin/sh -ex setup-wrapper.sh python3 MapReduce.py --step-num=0 --reducer']' returned non-zero exit status 1. """