aws-sdk-java
aws-sdk-java copied to clipboard
[S3] Default MIME type is "application/octet-stream" vs. "binary/octet-stream" with AWS CLI's s3api
My understanding is that the Java SDK sets by default a Content-Type: application/octet-stream
header if none is provided by the user when adding an object.
https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L4147
On the other hand, I've noticed that files uploaded via AWS CLI's s3api put-object ...
get assigned a Content-Type
of binary/octet-stream
.
I suppose that this value is assigned by the S3 backend, because I couldn't find any Content-Type
header set in the PUT request according to --debug
logs.
Uploading a file via the S3 Web console produces a binary/octet-stream
MIME type as well.
(For the record, Google Cloud Storage also uses binary/octet-stream
.)
I was wondering if you had any comment on this difference between application/octet-stream
and binary/octet-stream
?
Hi don't believe the CLI actually sets binary/octet-stream
. You can verify this by issuing a command like aws s3 cp MY_BINARY_FILE s3://MY_BUCKET --debug
and see that it does not send a Content-Type
header.
Per the S3 docs, the default is binary/octet-stream
if the header is not set: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html
Oops sorry looks like I didn't read your question close enough! I'm not sure what the exact reason is for the discrepancy but I suspect it's because application/octet-stream
is registered with the IANA, and binary/octet-stream
does not seem to be. In any case, I don't think it's possible for us to change this now because it would be subtle breaking change for customers.
Hum, if the default behavior can't be changed (what kind of "subtle change" do you expect by the way?), what about an option to specify the default MIME type?
Or is the only solution for the API user to explicitly set binary/octet-stream
for each PUT operation?
As far as breaking change, customers may be relying on the fact that the default MIME for objects uploaded using the Java SDK to be application/octet-stream
rather than binary/octet-stream
.
Do you expect all of your objects to have binary/octet-stream
? If so, you can use ClientConfiguration#withHeader
. Note that this will override the value set on ObjectMetadata#setContentType
so it won't work if the you need to be able to override the default value.
Another option might be a custom RequestHandler
We can also look into adding a default object MIME if that would be easiest for you.
Thanks for the tips on the various methods available.
I have full control of the code that uses the API so I can explicitly use ObjectMetadata#setContentType
for each call, that's not a big deal.
Yet, I am curious what you meant in your last sentence.
Are you referring to
We can also look into adding a default object MIME if that would be easiest for you.
?
I was imagining just a new option like setDefaultObjectMimeType
on S3ClientOptions
that works sort of like ClientConfiguration#withHeader
but at the S3 level so you can change mimetype per PUT call.
Yes, I think such an option would be useful. That would even give some API visibility to the fact that the Java SDK uses a "non-standard" default MIME type.
By the way, speaking of "application/octet-stream"
header, I saw in debug traces that (all?) HTTP requests sent to S3 contain such a header, even in case of a GET
or HEAD
.
I suppose that's not forbidden per se according to the spec, but I find it strange.
This is just a quick update letting you know that the SDK team has reviewed the feature request list for V1 and this one looks like a great candidate for a community PR, which we’ll help merge in and support.
For what it is worth (possibly nothing) ...
Internet lore and my personal failing memory has it that some old browsers (like IE) wanted to save anything with type 'application/octet-stream' as a .zip file, and a common way to avoid this was to use the unknown type 'binary/octet-stream'.
I also recally seeing 'application-data/octet-stream' which likely served the same purpose.
AWS S3 does default to 'binary/octet-stream'. It appears as 'ContentType' and can be seen with a 'head-object' operation. The only way to change it on an existing object is by copying-over the object and specifying the proper type for the new object (by default other metadata is preserved on copy).
aws s3 cp s3://<bucket>/<key> s3://<bucket>/<key> --content-type 'application/octet-stream'
The default type may also be the same on Azure and other services.
Along with the storage class, it is something that I've just learned to always explicitly set on objects.