mrjob
mrjob copied to clipboard
Run MapReduce jobs on Hadoop or Amazon Web Services
On Python 3.12 I get ``` ModuleNotFoundError: No module named 'distutils' ``` See https://peps.python.org/pep-0632/.
We're running into an issue with `str` vs `bytes` on Python 3 related to commit https://github.com/Yelp/mrjob/commit/0f0297b372fe9d5875915f7c3782b168543dd390 which changes `sys.stderr` from a `TextIOWrapper` in `'w'` mode to a `BufferedWriter` in `'wb'`...
# Patching CVE-2007-4559 Hi, we are security researchers from the Advanced Research Center at [Trellix](https://www.trellix.com). We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a...
i have a dataset which have 42 columns. How to read specific column from csv file using mrjob steps.
There are small typos in: - docs/whats-new.rst - mrjob/examples/mr_text_classifier.py - mrjob/sim.py Fixes: - Should read `refers` rather than `referse`. - Should read `consistently` rather than `consistenly`. - Should read `because`...
how to use total sort on hadoop, the attribute "PARTITIONER" ?
I set `GOOGLE_APPLICATION_CREDENTIALS` env variable properly, and am running a simple mrjob with `-r dataproc` option. However, it says ``` google.api_core.exceptions.Unknown: None Stream removed ``` While calling ``` self.cluster_client.get_cluster() ```...
We are using MrJob to process WARC files, in similar manner to [this example given in the Writing Jobs guide](https://github.com/Yelp/mrjob/blob/master/docs/guides/writing-mrjobs.rst#passing-entire-files-to-the-mapper). For our use case, it is crucial that the `.gz`...
I am running on Windows 10 using the latest mrjob version on conda-forge. I am using Hadoop 2.8.0. I have a problem of running ReduceMap job from Mrjob using the...
Fix docs to prevent user getting: "OSError: Input path hdfs://my_home/input.txtdoes not exist!"