mrjob icon indicating copy to clipboard operation
mrjob copied to clipboard

Run MapReduce jobs on Hadoop or Amazon Web Services

Results 100 mrjob issues
Sort by recently updated
recently updated
newest added

I want to join between two files,but I get Error _error : NameError: name 'names' is not defined_ !python job.py data.txt --database item.txt ``` from mrjob.job import MRJob from mrjob.step...

`_parse_progress_from_resource_manager()` assumes that there will be at most one job running on a cluster at the same time, which is wrong now that clusters can run steps concurrently. If we...

Bug

We've made a tremendous effort to reduce the number of API calls used by cluster pooling, but we still sometimes describe a cluster without saving information about it to our...

Cleanup

The mrjob README is pretty dated; it tries to sell mrjob as "the Python Hadoop streaming library" and doesn't talk about Spark features at all. We should highlight things like:...

Docs

Currently the Spark runner expects that the Spark master will be passed with the `--spark-master` option. However, it also takes arbitrary Spark configuration properties in the form `--jobconf PROP=VALUE`. Spark...

Feature

When we set up an SSH tunnel to the resource manager, we use the tunnel to check the job's progress and log/print it to the user. Now that we're checking...

Cleanup

Can't find any trace of this in the docs. Can I run a Python script with mrjob on my laptop, and have it connect to a remote Hadoop cluster over...

Hello, When executing the script, the error below is generated: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads (): subprocess failed with code 1 My environment is a cluster with 5 servers, 1 with the...

I wrote following code to do a words sort task ```Python #!/usr/bin/python # -*- coding: utf-8 -*- from mrjob.job import MRJob import re class MRwordCount(MRJob): def mapper(self, in_key, in_value): bins...

I'm having trouble running an example with EMR on AWS. Generate the following error: ``` Using configs in /home/ciceromoura/.mrjob.conf Creating temp directory /tmp/MR-DataMining-3.ciceromoura.20200606.202114.850991 writing master bootstrap script to /tmp/MR-DataMining-3.ciceromoura.20200606.202114.850991/b.sh uploading...