mrjob
mrjob copied to clipboard
Cannot input hdfs path to mrjob on local Hadoop
I think its a straightforward application of the mrjob library. I am not able to use hdfs file input:
Using configs in /Users/akshanshgupta/.mrjob.conf
Creating temp directory /var/folders/xz/g4k09fps1bd1vm9r4mj8dgnm0000gn/T/word_count_classic.akshanshgupta.20180830.115026.352538
Archiving /Users/akshanshgupta/Workspace/cc-mrjob/packages -> /var/folders/xz/g4k09fps1bd1vm9r4mj8dgnm0000gn/T/word_count_classic.akshanshgupta.20180830.115026.352538/archives/packages.tar.gz
Looking for hadoop binary in $PATH...
Found hadoop binary: /usr/local/bin/hadoop
STDERR: 2018-08-30 17:20:40,123 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
STDERR: ls: `hdfs:///workspace/pg2701.txt': No such file or directory
Traceback (most recent call last):
File "cc-mrjob/word_count_classic.py", line 31, in <module>
MRWordFreqCount.run()
File "/Users/akshanshgupta/mrjob/lib/python2.7/site-packages/mrjob/job.py", line 436, in run
mr_job.execute()
File "/Users/akshanshgupta/mrjob/lib/python2.7/site-packages/mrjob/job.py", line 457, in execute
super(MRJob, self).execute()
File "/Users/akshanshgupta/mrjob/lib/python2.7/site-packages/mrjob/launch.py", line 187, in execute
self.run_job()
File "/Users/akshanshgupta/mrjob/lib/python2.7/site-packages/mrjob/launch.py", line 235, in run_job
runner.run()
File "/Users/akshanshgupta/mrjob/lib/python2.7/site-packages/mrjob/runner.py", line 515, in run
self._check_input_paths()
File "/Users/akshanshgupta/mrjob/lib/python2.7/site-packages/mrjob/runner.py", line 1007, in _check_input_paths
if not self.fs.exists(path):
File "/Users/akshanshgupta/mrjob/lib/python2.7/site-packages/mrjob/fs/composite.py", line 84, in exists
return self._do_action('exists', path_glob)
File "/Users/akshanshgupta/mrjob/lib/python2.7/site-packages/mrjob/fs/composite.py", line 68, in _do_action
raise first_exception
IOError: Could not check path hdfs:///workspace/pg2701.txt
Of course the file is present in the said directory:
(mrjob) Workspace $ hadoop fs -ls workspace
2018-08-30 17:25:50,991 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 2 akshanshgupta supergroup 1257296 2018-08-29 22:28 workspace/pg2701.txt
drwxr-xr-x - akshanshgupta supergroup 0 2018-08-29 22:57 workspace/warc
Huh, that's pretty much what mrjob is doing. If you run your batch with -v
, it'll show you the hadoop fs
command it's running.
What happens if you hadoop fs -ls hdfs:///workspace/pg2701.txt
?