mrjob icon indicating copy to clipboard operation
mrjob copied to clipboard

Run MapReduce jobs on Hadoop or Amazon Web Services

Results 100 mrjob issues
Sort by recently updated
recently updated
newest added

If you specify an invalid JAR on EMR, it actually fails inside the controller, creating a "controller" log but no syslog or stderr log. I think the logic for the...

Feature

The way we patch mrjob.conf in tests is cumbersome and gets underfoot. The point was not to have the user's default mrjob.conf or `$MRJOB_CONF` affect tests. But the framework also...

Cleanup

Just noticed, in https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties that you can allow zero core instances by setting the property `dataproc:dataproc.allow.zero.workers` to `true`. Currently the assumption that at least 2 workers are required is hard-coded...

Feature

It looks like this is directly supported by the API; we just have to add our libjar `jarFileUris` in the [HadoopJob](https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs#HadoopJob) field, when we submit a job.

Feature

Since we're reading logs from Stackdriver, the cluster no longer has to exist. Targeting a particular step gets a little tricky, though we could filter by application_id.

Feature

Currently, we just stream raw lines from the driver output, rather than tagging them with path and line_num. This is a fairly easy fix; just not so important because we...

Feature

I'm noticing some issues of my jobs frequently running out of disk on EMR. I'm using an outdated AMI (`2.4.11`) but `mrjob-5.10`. I'm leveraging the great pooling features in `mrjob`...

Bug

This is relevant to #754, which I'm in the process of testing. I'd like to use `gsutil` to download input files from Google Cloud Storage to Hadoop nodes, rather than...

Cleanup

If you give EMR an S3 output path that's in another region, your job fails with this error: ``` Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Moved Permanently (Service: Amazon S3; Status...

Cleanup

Protocols should be allowed to have `HADOOP_*_FORMAT` and `JOBCONF` fields, as well as `hadoop_*_format()` and `jobconf()` methods, which supply defaults if something is not already specified for that step. That...

Feature