mrjob icon indicating copy to clipboard operation
mrjob copied to clipboard

Run MapReduce jobs on Hadoop or Amazon Web Services

Results 100 mrjob issues
Sort by recently updated
recently updated
newest added

AWS's own tags start with `aws:`. We should do the same with mrjob tag names, e.g. `mrjob:version` rather than `__mrjob_version` and `mrjob:pool:name` rather than `__mrjob_pool_name`. Old tag names used by...

Cleanup

Python 3.8 was released last October. Currently, tests do not pass with Python 3.8. The problem seems to be with `pyspark`.

Feature

Am I doing something wrong or is the `--conf-path` / `-c` option not supported when trying to run a PySpark script using `mrjob spark-submit`? For example, I have an EMR...

Deprecation warnings are raised due to invalid escape sequences in Python 3.8 . Below is a log of the warnings raised during compiling all the python files. Using raw strings...

Sometimes, it is useful to not wait for an mrjob to complete. An important use case is to spin up an EMR job from e.g. an AWS lambda function. In...

Feature

I have a mrjob: ``` from mrjob.job import MRJob from mrjob.step import MRStep from mrjob.protocol import JSONProtocol, RawValueProtocol, PickleProtocol, JSONValueProtocol import pandas class WordCount(MRJob): INPUT_PROTOCOL = RawValueProtocol INTERNAL_PROTOCOL = PickleProtocol...

If you run `python -m mrjob.examples.mr_wc -r dataproc --image-version 1.4`, it fails with: ``` Traceback (most recent call last): File "mr_wc.py", line 18, in from mrjob.job import MRJobImportError: No module...

Bug

``` from mrjob.job import MRJob from mrjob.step import MRStep import re import numpy as np WORD_RE = re.compile(r"[\w']+") class WordCount(MRJob): def mapper(self, _, line): for word in WORD_RE.findall(line): yield (word.lower(),...

If a `MRJob`s has a very large number of entries associated with the same reducer key, it can be difficult to run through the Spark runner, because all the entries...

Feature