Coyote Codornices Marin
Coyote Codornices Marin
The cluster ID is in `/var/aws/emr/userData.json`. You can run something like this: ``` $ aws emr terminate-clusters --cluster-id $(jq -r .clusterId /var/aws/emr/userData.json) ``` to grab the cluster ID and ask...
mrjob creates its own IAM instance and service profile roles, so we could conceivably add `elasticmapreduce:TerminateJobFlows`. The idle timeout script could attempt to run `aws emr terminate-clusters` and fall back...
Haven't tried it, but the [Hadoop Streaming docs](http://hadoop.apache.org/mapreduce/docs/current/streaming.html#Specifying+a+Java+Class+as+the+Mapper%2FReducer) have an example of mixing a Python script with `aggregate` that seems pretty straightforward. It looks like keys can be whatever, and...
Reopening this, because I don't think we ever did this (though it looks like you can accomplish the same thing using `*_cmd` with the class name as your "command"). I'd...
Wow, this issue has been around a while! Should be pretty straightforward.
It's not clear to me if anything other than "aggregate" works here. Might not be worth it, since this is already covered by `reducer_cmd`.
Pretty sure this can be handled with `reducer_cmd()`, but would be good to have an working example. A description of how to use the aggregate package is here: https://hadoop.apache.org/docs/current/hadoop-streaming/HadoopStreaming.html#Hadoop_Aggregate_Package
Maybe as part of the `mrjob ssh` subcommand? (see #1113) mrjob at least attempts to use the same port number on any given cluster by using the cluster ID as...
Having poked at this a bit, it seems to be an issue that exists in the 4.x AMIs but not the 5.x ones (so it won't happen by default). I...