mrjob icon indicating copy to clipboard operation
mrjob copied to clipboard

Warn about S3 output dir in wrong region

Open coyotemarin opened this issue 8 years ago • 3 comments

If you give EMR an S3 output path that's in another region, your job fails with this error:

Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Moved Permanently (Service: Amazon S3; Status Code: 301; Error Code: 301 Moved Permanently; Request ID: 56451A04FCF24CCE), S3 Extended Request ID: uXkfEaU0FnL/FNZ+bvTulXyGXd0Sez8v6UA8vhtPkQmuC3XG/ivTE09cPMwJ6NgTU0VRU15+0Qs=

at least on AMI 4.8.2.

We should warn the user if they specify an output path in the wrong region.

coyotemarin avatar Jan 04 '17 01:01 coyotemarin

Having poked at this a bit, it seems to be an issue that exists in the 4.x AMIs but not the 5.x ones (so it won't happen by default). I think this is not work the extra code and tests.

coyotemarin avatar Feb 23 '18 23:02 coyotemarin

Code for this is in https://github.com/davidmarin/mrjob/tree/wrong-region-warning. As it is, it's not a very useful "feature" because the warning may or may not be relevant. Doing this right would take more researching the behavior of various EMR AMIs, and probably having it quit with an error, like we do if the output dir already exists.

coyotemarin avatar Feb 24 '18 00:02 coyotemarin

Also would need to fix tests: we're getting a NoSuchBucket error because the code now checks a bucket that it previously didn't, or checks it too soon in the process.

coyotemarin avatar Feb 24 '18 01:02 coyotemarin