Coyote Codornices Marin
Coyote Codornices Marin
Code for this is in https://github.com/davidmarin/mrjob/tree/wrong-region-warning. As it is, it's not a very useful "feature" because the warning may or may not be relevant. Doing this right would take more...
Also would need to fix tests: we're getting a `NoSuchBucket` error because the code now checks a bucket that it previously didn't, or checks it too soon in the process.
The current workaround is to use a glob: `python mr_your_job.py input_dir/*`
The relevant issue about "hidden" input files is #1200. Originally thought this behavior was part of Hadoop Streaming, but it's actually part of `FileInputFormat` (see http://stackoverflow.com/questions/19830264/which-files-ignored-as-input-by-mapper), which may or may...
Shoot, this behavior is actually intentional and goes hand-in-hand with the new improved local/inline mode. The idea being that streaming tasks are really only supposed to read from stdin. It...
Related (but probably a separate ticket): if you pass a local directory as input to mrjob and then try to run the job on EMR, it doesn't get shipped to...
Here's the relevant API call: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateImage.html
Oh, good point. Currently puzzling through this now.
Okay, this doesn't sound that hard in concept, but there are a lot of moving parts, and I have to learn a lot more about the EC2 API.
(I can verify that attempting to use a snapshot of a live EMR cluster fails pretty hard.)