Coyote Codornices Marin
Coyote Codornices Marin
If you pass Hadoop a directory as input, it reads all non-"hidden" files (files whose names don't start with `_` or `.`) in that directory, but doesn't recurse into subdirectories...
When people use the same job flow for several jobs, they like to be able to just leave the same SSH tunnel open. Currently, ssh tunnels are tied to runners,...
Would be nice to have a way to run a script on the master node before running our job. Example applications: - copying jars to the local filesystem to support...
Looks like we should be able to automatically [create key pairs through the EC2 API](http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateKeyPair.html) so that SSH will always work. Some things to consider: - should be a way...
EMR's 3.x AMIs include Mahout 0.8. Would be great to have an awesome demo that uses Mahout, that anyone can run on EMR.
It would be nice to be able to use some of the built-in mappers/reducers from Java for effiency reasons (e.g. org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer). probably would look something like this: ``` def steps(self):...
Currently there seem to be no tests of what happens when a job run by the sim runners throws an exception. We need to test: - [ ] in inline...
We should enable the ability to use a custom machine image (see #1805) on Dataproc.
Now that Amazon bills by the second rather than the full hour, cluster pooling is not usually a good way to save money. However, it does save you from having...
This seems to be an issue specific to jobs and clusters that are currently running. Possibly we're using different values of "now", and the script is running a long time?