aws-sdk-java icon indicating copy to clipboard operation
aws-sdk-java copied to clipboard

[S3] Default MIME type is "application/octet-stream" vs. "binary/octet-stream" with AWS CLI's s3api

Open dalbani opened this issue 7 years ago • 10 comments

My understanding is that the Java SDK sets by default a Content-Type: application/octet-stream header if none is provided by the user when adding an object. https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L4147

On the other hand, I've noticed that files uploaded via AWS CLI's s3api put-object ... get assigned a Content-Type of binary/octet-stream. I suppose that this value is assigned by the S3 backend, because I couldn't find any Content-Type header set in the PUT request according to --debug logs.

Uploading a file via the S3 Web console produces a binary/octet-stream MIME type as well. (For the record, Google Cloud Storage also uses binary/octet-stream.)

I was wondering if you had any comment on this difference between application/octet-stream and binary/octet-stream?

dalbani avatar May 02 '17 14:05 dalbani

Hi don't believe the CLI actually sets binary/octet-stream. You can verify this by issuing a command like aws s3 cp MY_BINARY_FILE s3://MY_BUCKET --debug and see that it does not send a Content-Type header.

Per the S3 docs, the default is binary/octet-stream if the header is not set: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html

dagnir avatar May 02 '17 19:05 dagnir

Oops sorry looks like I didn't read your question close enough! I'm not sure what the exact reason is for the discrepancy but I suspect it's because application/octet-stream is registered with the IANA, and binary/octet-stream does not seem to be. In any case, I don't think it's possible for us to change this now because it would be subtle breaking change for customers.

dagnir avatar May 02 '17 19:05 dagnir

Hum, if the default behavior can't be changed (what kind of "subtle change" do you expect by the way?), what about an option to specify the default MIME type? Or is the only solution for the API user to explicitly set binary/octet-stream for each PUT operation?

dalbani avatar May 03 '17 07:05 dalbani

As far as breaking change, customers may be relying on the fact that the default MIME for objects uploaded using the Java SDK to be application/octet-stream rather than binary/octet-stream.

Do you expect all of your objects to have binary/octet-stream? If so, you can use ClientConfiguration#withHeader. Note that this will override the value set on ObjectMetadata#setContentType so it won't work if the you need to be able to override the default value.

Another option might be a custom RequestHandler

We can also look into adding a default object MIME if that would be easiest for you.

dagnir avatar May 03 '17 18:05 dagnir

Thanks for the tips on the various methods available. I have full control of the code that uses the API so I can explicitly use ObjectMetadata#setContentType for each call, that's not a big deal. Yet, I am curious what you meant in your last sentence.

dalbani avatar May 03 '17 18:05 dalbani

Are you referring to

We can also look into adding a default object MIME if that would be easiest for you.

?

I was imagining just a new option like setDefaultObjectMimeType on S3ClientOptions that works sort of like ClientConfiguration#withHeader but at the S3 level so you can change mimetype per PUT call.

dagnir avatar May 03 '17 19:05 dagnir

Yes, I think such an option would be useful. That would even give some API visibility to the fact that the Java SDK uses a "non-standard" default MIME type.

dalbani avatar May 03 '17 20:05 dalbani

By the way, speaking of "application/octet-stream" header, I saw in debug traces that (all?) HTTP requests sent to S3 contain such a header, even in case of a GET or HEAD. I suppose that's not forbidden per se according to the spec, but I find it strange.

dalbani avatar May 04 '17 10:05 dalbani

This is just a quick update letting you know that the SDK team has reviewed the feature request list for V1 and this one looks like a great candidate for a community PR, which we’ll help merge in and support.

debora-ito avatar Oct 04 '19 20:10 debora-ito

For what it is worth (possibly nothing) ...

Internet lore and my personal failing memory has it that some old browsers (like IE) wanted to save anything with type 'application/octet-stream' as a .zip file, and a common way to avoid this was to use the unknown type 'binary/octet-stream'.

I also recally seeing 'application-data/octet-stream' which likely served the same purpose.

AWS S3 does default to 'binary/octet-stream'. It appears as 'ContentType' and can be seen with a 'head-object' operation. The only way to change it on an existing object is by copying-over the object and specifying the proper type for the new object (by default other metadata is preserved on copy).

aws s3 cp s3://<bucket>/<key> s3://<bucket>/<key> --content-type 'application/octet-stream'

The default type may also be the same on Azure and other services.

Along with the storage class, it is something that I've just learned to always explicitly set on objects.

aemileski avatar Feb 12 '22 07:02 aemileski