mrjob icon indicating copy to clipboard operation
mrjob copied to clipboard

Run MapReduce jobs on Hadoop or Amazon Web Services

Results 100 mrjob issues
Sort by recently updated
recently updated
newest added

I tried the worldcount on hadoop (EMR on AWS). I get the following error by `python3 job.py -r hadoop hdfs:///user/hadoop/input.txt`: [hadoop@ip-172-31-36-214 wordcount]$ python3 job.py -r hadoop hdfs:///user/hadoop/input.txt No configs found;...

Our fs interface now has `put()` but no `get()`. Would be relatively trivial to add one; just open a file and cat chunks into it.

Feature

When a Spark job on Dataproc fails, we should be able to parse logs and find cause of error.

Feature

Should add support for running Spark jobs.

Feature

This wouldn't be very difficult; you basically run `ssh ... cat ` and pipe in the file contents. But it's also not especially useful.

Feature

These would save the user from having to import `pyspark`, and could also set up `SparkConf` for you. Probably mostly matters for the inline runner (see #1965).

Feature

For complete support of `MRJob`, it would be helpful for the Spark harness to be able to be able to implement e.g. `mapper_cmd()`, `mapper_pre_filter()`. This isn't actually that difficult to...

Feature

I am trying to find keywords from CommonCrawl archive. When I tried to run with one wet.gz file, it works fine. But If I try to run our script with...

We may want to consider eventually dropping support for Spark 1. At the very least, the spark harness which allows running MRJobs on the spark runner (see #1838) will be...

Cleanup

mockhadoop tests are slow and hard to debug. mockhadoop doesn't support generic options (e.g. `-D`, `-jobconf`).

Testing