mrjob issues

tag StepFailedException with cluster ID?

Sometimes mrjob is being run in an environment where only the `StepFailedException` gets through. It might be helpful to be able to tag `StepFailedException` with arbitrary information (e.g. `cluster_id`).

coyotemarin

Feature

Question: Step is failing. How do debug?

I am trying to run sample mrjob in EMR cluster. I have created EMR cluster manually in AWS dashboard and started mrjob as follows python keywords.py -r emr s3://commoncrawl/crawl-data/CC-MAIN-2018-34/wet.paths.gz --cluster-id...

aaqibjavith

auto-create AMI snapshots based on bootstrapping

16

While updating mrjob to support custom AMIs (#1805) is a good start, it's still significant work to roll your own AMI and keep it up-to-date. Instead, mrjob should look to...

coyotemarin

Feature

use moto instead of tests/mock_boto3 for tests

mrjob has historically mocked out various AWS services. Currently this code lives in `tests/mock_boto3`. The [moto](https://github.com/spulec/moto) library does basically the same thing. mrjob should probably try to move to moto,...

coyotemarin

Testing

handle local input dirs like Hadoop

6

If you pass Hadoop a directory as input, it reads all non-"hidden" files (files whose names don't start with `_` or `.`) in that directory, but doesn't recurse into subdirectories...

coyotemarin

Feature

Promote MRJobLauncher.arg_parser to API

1

It's undocumented, but it's valid to write command line options that aren't passthrough. See my comment on #198 to implement an `--add-libjar` option.

irskep

Docs

ssh tunnels without job runners

1

When people use the same job flow for several jobs, they like to be able to just leave the same SSH tunnel open. Currently, ssh tunnels are tied to runners,...

coyotemarin

Feature

BotoCore Timeouts

4

This may be able to be solved in another manner, but was wondering if it would make sense to include a connect and read timeout parameter into the mrjob.conf since...

jroakes

Feature

EMR: master node setup script

1

Would be nice to have a way to run a script on the master node before running our job. Example applications: - copying jars to the local filesystem to support...

coyotemarin

Feature

auto-create EC2 key pair

2

Looks like we should be able to automatically [create key pairs through the EC2 API](http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateKeyPair.html) so that SSH will always work. Some things to consider: - should be a way...

coyotemarin

Feature

mrjob
mrjob copied to clipboard

Metadata

tag StepFailedException with cluster ID?

Question: Step is failing. How do debug?

auto-create AMI snapshots based on bootstrapping

use moto instead of tests/mock_boto3 for tests

handle local input dirs like Hadoop

Promote MRJobLauncher.arg_parser to API

ssh tunnels without job runners

BotoCore Timeouts

EMR: master node setup script

auto-create EC2 key pair

← Metadata

Owner

Metadata

mrjob mrjob copied to clipboard

Metadata

← Metadata

Owner

Metadata

mrjob
mrjob copied to clipboard