mrjob
mrjob copied to clipboard
Run MapReduce jobs on Hadoop or Amazon Web Services
I want to join between two files,but I get Error _error : NameError: name 'names' is not defined_ !python job.py data.txt --database item.txt ``` from mrjob.job import MRJob from mrjob.step...
`_parse_progress_from_resource_manager()` assumes that there will be at most one job running on a cluster at the same time, which is wrong now that clusters can run steps concurrently. If we...
We've made a tremendous effort to reduce the number of API calls used by cluster pooling, but we still sometimes describe a cluster without saving information about it to our...
The mrjob README is pretty dated; it tries to sell mrjob as "the Python Hadoop streaming library" and doesn't talk about Spark features at all. We should highlight things like:...
Currently the Spark runner expects that the Spark master will be passed with the `--spark-master` option. However, it also takes arbitrary Spark configuration properties in the form `--jobconf PROP=VALUE`. Spark...
When we set up an SSH tunnel to the resource manager, we use the tunnel to check the job's progress and log/print it to the user. Now that we're checking...
Can't find any trace of this in the docs. Can I run a Python script with mrjob on my laptop, and have it connect to a remote Hadoop cluster over...
Hello, When executing the script, the error below is generated: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads (): subprocess failed with code 1 My environment is a cluster with 5 servers, 1 with the...
I wrote following code to do a words sort task ```Python #!/usr/bin/python # -*- coding: utf-8 -*- from mrjob.job import MRJob import re class MRwordCount(MRJob): def mapper(self, in_key, in_value): bins...
I'm having trouble running an example with EMR on AWS. Generate the following error: ``` Using configs in /home/ciceromoura/.mrjob.conf Creating temp directory /tmp/MR-DataMining-3.ciceromoura.20200606.202114.850991 writing master bootstrap script to /tmp/MR-DataMining-3.ciceromoura.20200606.202114.850991/b.sh uploading...