Coyote Codornices Marin

Results 82 comments of Coyote Codornices Marin

Did #1922 without switching to a pull model, instead de-coupling management of the working directory from uploading.

I didn't write the Docker wrapper that Yelp uses, so apologies if this is somewhat vague! The basic idea is to set `python_bin` to something that runs `docker run`. For...

Shoot, currently the script doesn't distinguish time spent provisioning the cluster (`STARTING` state) from time bootstrapping it. This isn't available from `DescribeClusters` — maybe there's some other way to get...

`ListInstances` shows the same `ReadyDateTime` as the cluster.

okay, looks like you use `ListInstances` and then the EC2 API's `DescribeInstances` and look at the `LaunchTime` for each instance in the cluster. It's probably close enough to consider billing...

It's now `self.arg_parser` as of mrjob v0.6.0. We probably should document it.

Huh! Yeah, from a use case standpoint, this is a bug, but there's not a way to reach HDFS through the EMR API, only via SSH. The Hadoop runner would...

Oh yeah, that'd totally work. Thanks! If we want to condition it on the job being successful (it's useful to have intermediate data from failed jobs), that's just a short...

Can't find job flow ID (so far), but I figured out [how to get the EC2 instance ID](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AESDG-chapter-instancedata.html#instancedata-data-retrieval) (basically wget/curl `http://169.254.169.254/latest/meta-data/instance-id`). We can then match this up with `MasterInstanceId` from...

The main issue here is that EMR clusters generally aren't run with the right IAM permissions to terminate EMR clusters (including themselves).