mrjob
mrjob copied to clipboard
Warn about S3 output dir in wrong region
If you give EMR an S3 output path that's in another region, your job fails with this error:
Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Moved Permanently (Service: Amazon S3; Status Code: 301; Error Code: 301 Moved Permanently; Request ID: 56451A04FCF24CCE), S3 Extended Request ID: uXkfEaU0FnL/FNZ+bvTulXyGXd0Sez8v6UA8vhtPkQmuC3XG/ivTE09cPMwJ6NgTU0VRU15+0Qs=
at least on AMI 4.8.2.
We should warn the user if they specify an output path in the wrong region.
Having poked at this a bit, it seems to be an issue that exists in the 4.x AMIs but not the 5.x ones (so it won't happen by default). I think this is not work the extra code and tests.
Code for this is in https://github.com/davidmarin/mrjob/tree/wrong-region-warning. As it is, it's not a very useful "feature" because the warning may or may not be relevant. Doing this right would take more researching the behavior of various EMR AMIs, and probably having it quit with an error, like we do if the output dir already exists.
Also would need to fix tests: we're getting a NoSuchBucket
error because the code now checks a bucket that it previously didn't, or checks it too soon in the process.