Coyote Codornices Marin

Results 82 comments of Coyote Codornices Marin

The EMR does have the concept of a master node setup script, which it currently uses to make `libjars` work. It might make sense to expose this functionality. I've certainly...

Basically, we want to run something like `hadoop jar /home/hadoop/mahout/mahout-core--job.jar org.apache.mahout.clustering.kmeans.KMeansDriver `. We should be able to get the version of Mahout from the API.

Also, it looks like Mahout uses sequence files. Looks like there are input formats that convert sequence files to/from text; here's an example on Stack Overflow: http://stackoverflow.com/questions/5060967/how-to-use-hadoop-streaming-with-lzo-compressed-sequence-files/6364689#6364689

Probably want a `MahoutStep` class that's basically `JarStep` where mrjob automatically finds the jar for you.

Yeah, it really should be waiting until 10 minutes after the step finished (which may be less than 10 minutes, or no time at all, for steps that have already...

Ah, I see, you want to get the counters just as soon as they're available. The problem is that I don't think there's a way for mrjob to distinguish between...

Oh, I see what you mean, [ListSteps](http://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_ListSteps.html) returns the most recently queued step first (presumably because if you had to paginate, the most recent steps are the most relevant). Sorry,...

Can you tell me how mrjob 0.5.2 works for you? I'm guessing the fix for #1316 mitigates this somewhat.

I'm guessing it has something to do with symlinks? Check out `_symlink_or_copy` in `mrjob/sim.py`, and see what happens if you comment out the portion that calls `os.symlink` (`if hasattr(os, 'symlink'):...

This especially makes sense if some of your steps are `JarStep`s.