aws-sdk-java GetObjectRequest in S3 should support final bytes as a Range header value

According to https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35, a Range header may include a single negative value to indicate the last X bytes in a file should be retrieved. For example bytes=-500 is a valid Range value for the final 500 bytes in a file.

This Range header option is currently supported by S3, as verified through the AWS S3 CLI.

Currently, GetObjectRequest includes setRange(long start) and setRange(long start, long end), which supports Range values like bytes=100- and bytes=100-200, however, there is no way to provide a Range value in GetObjectRequest which results in "bytes=-100", despite the fact that this is a valid value which is already supported by S3.

Apr 13 '18 18:04 bbranan

Makes sense, we'd have to see if we can make this in a backwards compatible way. In the meantime I think you should be able to workaround this by doing something like the following.

        GetObjectRequest req = new GetObjectRequest("bucket", "key");
        req.putCustomRequestHeader("Range", "-500");
        amazonS3.getObject(req);

Apr 13 '18 19:04 shorea

The simplest way to be backwards compatible here would likely be to add a new method, perhaps something like setRangeEnd(long end), which results in the expected header value.

Thanks for the work around, I will use that strategy for now, though I believe the call would need to be

req.putCustomRequestHeader("Range", "bytes=-500");

Apr 13 '18 19:04 bbranan

Yes good catch.

Apr 14 '18 00:04 shorea

Using the suggested work around results in the following error:

com.amazonaws.SdkClientException: Unable to verify integrity of data download.  Client calculated content hash didn't match hash calculated by Amazon S3.  The data may be corrupt.
	com.amazonaws.services.s3.internal.DigestValidationInputStream.validateMD5Digest(DigestValidationInputStream.java:79)
	com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:61)
	com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)

By default, when a getObject() request is made, the checksum of the retrieved file is verified against the complete file checksum by the client. Of course, the subset of bytes retrieved with a Range request will not have the expected checksum. When GetObjectRequest.setRange() is used, the checksum validation step is disabled (based on an internal getRange() check). Setting Range as a custom header does not result in the checksum validation being disabled, so it fails consistently.

This update to the work around allows it to work by setting the range (thus disabling the checksum check), then overwriting the Range header value with the custom header:

    GetObjectRequest req = new GetObjectRequest("bucket", "key");
    req.setRange(0);
    req.putCustomRequestHeader("Range", "bytes=-500");
    amazonS3.getObject(req);

Unfortunately, this is based on the assumption that the internal implementation will continue to override the Range value with the custom header. That does not seem like a good assumption to make.

Apr 18 '18 21:04 bbranan

You can disable md5 checks for GET request using the System Property. https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/SkipMd5CheckStrategy.java#L34.

Note: this will disable md5 checks for ALL get requests.

Apr 18 '18 22:04 zoewangg

Thanks for the pointer @zoewangg. Unfortunately, the majority of requests I will be making are full-object requests, and I really do want md5 checks to occur for those transfers. I'm just looking for a way to disable the md5 checks specifically for Range-limited requests.

Apr 19 '18 12:04 bbranan

This will also be useful for file formats like ORC and Parquet that want to read the file footer first.

Dec 06 '19 16:12 omalley

@omalley I have exactly this use case. Did you find an acceptable work around?

Mar 18 '20 03:03 kyprifog

I was able to just pull the content length from the header and then have a second call using that content length to pull the footer, although I am guessing this issue is about being able to do this without doing 2 calls

Mar 20 '20 15:03 kyprifog

We don't have plans to support this in v1.

We are closing stale v1 issues before going into Maintenance Mode, so if this issue is still relevant in v2 please open a new issue in the v2 repo.

Reference:

Announcing end-of-support for AWS SDK for Java v1.x effective December 31, 2025 - blog post

Jul 17 '24 01:07 debora-ito

This issue is now closed.

Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

Jul 17 '24 01:07 github-actions[bot]

aws-sdk-java aws-sdk-java copied to clipboard

GetObjectRequest in S3 should support final bytes as a Range header value

Reference:

aws-sdk-java
aws-sdk-java copied to clipboard