aws-sdk-perl icon indicating copy to clipboard operation
aws-sdk-perl copied to clipboard

S3 PutObject URI encodes key?

Open dnmfarrell opened this issue 8 years ago • 9 comments

Does PutObject() URI encode the key string? These files are being uploaded to S3 with the key containing percent encoded characters:

my $output = $s3->PutObject(
  Bucket  => $bucket,
  Key     => 'example picture.jpg',
  ACL     => 'public-read',
  Body    => $data,
); 

example picture.jpg is actually uploaded to S3 as example%20picture.jpg. Which means normal paths to it don't work because the URI becomes: example%2520picture.jpg

Not sure if this is a Paws issue or an AWS one :smile:

dnmfarrell avatar Oct 01 '16 21:10 dnmfarrell

Hi. This is definitely a Paws issue that needs looking into. The should be happening inside of Paws::Net::RestXmlCaller, line 67. I if you add a space to the safe characters q[^A-Za-z0-9\-\._~ /], we won't encode the spaces in the URLs.

I'll try to look into S3 documentation to find which characters should and shouldn't be uri encoded. Any pointers welcome :smile:

pplu avatar Oct 02 '16 20:10 pplu

Got it. I found the AWS recommendations.

I think you could either: encode anything that AWS say isn't a "Safe Character", so line 67 becomes something like this:

$vars->{ $att_name } = uri_escape_utf8($call->$att_name, q([^0-9a-zA-Z!_.*'()]);

Or maybe blacklist the "Characters That Might Require Special Handling" and "Characters to Avoid":

$vars->{ $att_name } = uri_escape_utf8($call->$att_name, q([&$@=:+,?\\{\^}%>\[\]`~<#|]);

(the code is untested). Given the AWS recommendations, it doesn't appear to be "wrong" to encode whitespace. The first example will and the second example won't but either seem like valid approaches to me. What do you think?

dnmfarrell avatar Oct 02 '16 21:10 dnmfarrell

From the doc, it looks like nothing should be escaped in the key (everything is a "may", "should", "might"...). I'm curious what others are doing with the keys... I suspect that it's up to the code that treats the key of handling "special cases", that is, if your code is going to interpret the key as the name of a file, then some symbols will cause problems converting the key to a filesystem path. For Paws, I'm thinking that key names should go to S3 unfiltered, but I'd try to see what boto, and the ruby SDK do with keys.

pplu avatar Oct 03 '16 08:10 pplu

Hi, We just bumped into this issue. We upload S3 objects from our perl app using Paws, and from a java app using a java library (I don't know which but I will try to find out). The java lib does not encode spaces, but Paws does, leading to mismatches.

I agree with you that the AWS docs are very vague about this, so my suggestion would be to minimize changes and just add the space to the list of safe characters. This should be safe and would solve our issue and also potential issues of double encoding by browsers etc. So the line in Paws::Net::RestXmlCaller would become:

$vars->{ $att_name } =  uri_escape_utf8($call->$att_name, q[^A-Za-z0-9\-\._~/ ]);

What are your thoughts?

lhengstmengel avatar May 09 '17 11:05 lhengstmengel

@sven-schubert: You may want to look at this issue, as you are currently working on S3

pplu avatar Oct 06 '17 22:10 pplu

We just ran into this issue. We have two systems that are putting objects into S3, one written in Golang using the official golang aws sdk, and one written in Perl using PAWS. Our keys contain email addresses, with the @ intact. The golang sdk is uploading them as is, while perl is encoding the @ into %40 and breaking our downstream processes that are trying to find the messages.

@pplu Is there a workaround or any way to address this?

veqryn avatar Feb 16 '18 17:02 veqryn

@veqryn : in this issue a couple of workarounds are suggested (https://github.com/pplu/aws-sdk-perl/issues/111#issuecomment-300135620) and (https://github.com/pplu/aws-sdk-perl/issues/111#issuecomment-250996388). I'd love a pull request with a fix, since I didn't arrive to a conclusion about what has to get encoded or not.

pplu avatar Feb 16 '18 17:02 pplu

Those workarounds look like a modification of the PAWS source right?

Is there anything a client using this library can do as a workaround without modifying the library's source?

veqryn avatar Feb 16 '18 17:02 veqryn

This should be fixed together with the fix for #221 (PR in #265) - which ensures we only encode once in the URI. tests have been added to check this: t/s3/uri_encoding.t

castaway avatar Nov 06 '19 12:11 castaway