mrjob icon indicating copy to clipboard operation
mrjob copied to clipboard

Run MapReduce jobs on Hadoop or Amazon Web Services

Results 100 mrjob issues
Sort by recently updated
recently updated
newest added

On Python 3.12 I get ``` ModuleNotFoundError: No module named 'distutils' ``` See https://peps.python.org/pep-0632/.

We're running into an issue with `str` vs `bytes` on Python 3 related to commit https://github.com/Yelp/mrjob/commit/0f0297b372fe9d5875915f7c3782b168543dd390 which changes `sys.stderr` from a `TextIOWrapper` in `'w'` mode to a `BufferedWriter` in `'wb'`...

# Patching CVE-2007-4559 Hi, we are security researchers from the Advanced Research Center at [Trellix](https://www.trellix.com). We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a...

i have a dataset which have 42 columns. How to read specific column from csv file using mrjob steps.

There are small typos in: - docs/whats-new.rst - mrjob/examples/mr_text_classifier.py - mrjob/sim.py Fixes: - Should read `refers` rather than `referse`. - Should read `consistently` rather than `consistenly`. - Should read `because`...

how to use total sort on hadoop, the attribute "PARTITIONER" ?

I set `GOOGLE_APPLICATION_CREDENTIALS` env variable properly, and am running a simple mrjob with `-r dataproc` option. However, it says ``` google.api_core.exceptions.Unknown: None Stream removed ``` While calling ``` self.cluster_client.get_cluster() ```...

We are using MrJob to process WARC files, in similar manner to [this example given in the Writing Jobs guide](https://github.com/Yelp/mrjob/blob/master/docs/guides/writing-mrjobs.rst#passing-entire-files-to-the-mapper). For our use case, it is crucial that the `.gz`...

I am running on Windows 10 using the latest mrjob version on conda-forge. I am using Hadoop 2.8.0. I have a problem of running ReduceMap job from Mrjob using the...

Bug

Fix docs to prevent user getting: "OSError: Input path hdfs://my_home/input.txtdoes not exist!"