aws-sdk-java icon indicating copy to clipboard operation
aws-sdk-java copied to clipboard

S3 upload stream with http throws stream mark and reset error

Open thisarattr opened this issue 7 years ago • 7 comments

I am trying to stream a file straight into S3 rather than upload/buffer into our own server and reupload into S3. When I use http aws client is trying to calculate message digest and failed to reset the stream. Further, I haven't set a explicit read limit, so it default to 128kb and im uploading stream larger than that. As per the AWS client code, it set the mark() to the request read limit and then it reads the whole stream, which is beyond the mark() and try to reset() it. Which is obviously going to fail and throw the reset error.

Note: When im using https this wont happen as signing is disabled by default, but u will face the same with https if you enable signing.

AWS4Signer.java

    protected String calculateContentHash(SignableRequest<?> request) {
        InputStream payloadStream = getBinaryRequestPayloadStream(request);
        ReadLimitInfo info = request.getReadLimitInfo();
        payloadStream.mark(info == null ? -1 : info.getReadLimit());
        String contentSha256 = BinaryUtils.toHex(hash(payloadStream));
        try {
            payloadStream.reset();
        } catch (IOException e) {
            throw new SdkClientException(
                    "Unable to reset stream after calculating AWS4 signature",
                    e);
        }
        return contentSha256;
    }

AbstractAWSSigner.java

    protected byte[] hash(InputStream input) throws SdkClientException {
        try {
            MessageDigest md = getMessageDigestInstance();
            @SuppressWarnings("resource")
            DigestInputStream digestInputStream = new SdkDigestInputStream(input, md);
            byte[] buffer = new byte[1024];
            while (digestInputStream.read(buffer) > -1)
                ;
            return digestInputStream.getMessageDigest().digest();
        } catch (Exception e) {
            throw new SdkClientException(
                    "Unable to compute hash while signing request: "
                            + e.getMessage(), e);
        }
    }

Exception thrown,

Caused by: com.amazonaws.SdkClientException: Unable to reset stream after calculating AWS4 signature
at com.amazonaws.auth.AWS4Signer.calculateContentHash(AWS4Signer.java:562)
at com.amazonaws.services.s3.internal.AWSS3V4Signer.calculateContentHash(AWSS3V4Signer.java:118)
at com.amazonaws.auth.AWS4Signer.sign(AWS4Signer.java:233)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1210)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1749)
at com.platform.common.services.S3BinaryUploadService.uploadBinaryToUploadBucket(S3BinaryUploadService.java:61)
... 84 common frames omitted
Caused by: java.io.IOException: Resetting to invalid mark
at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.util.LengthCheckInputStream.reset(LengthCheckInputStream.java:126)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.reset(MD5DigestCalculatingInputStream.java:105)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
at com.amazonaws.auth.AWS4Signer.calculateContentHash(AWS4Signer.java:560)
... 98 common frames omitted

thisarattr avatar Sep 04 '18 01:09 thisarattr

This is a known issue and a current limitation of the SDK. There are similar posts with workarounds. Please refer to them and see if they work for you. https://github.com/aws/aws-sdk-java/issues/427#issuecomment-273550783 https://github.com/aws/aws-sdk-java/issues/474

varunnvs92 avatar Sep 04 '18 20:09 varunnvs92

Thanks a lot for the response. I saw ur answer before, but what I am trying to do here is, stream a file straight from user into S3 rather than download/buffer into our server. Thus, I don't have the file, so option 1 is out for me. Yes, I can set the read limit beyond the max expected file size, but then in that case aws-sdk will read the whole file in memory to do the signing (and fail with the exception), which is what I want to avoid. Because this api expect large binaries which could go close to a GB.

By the way, I know this will be solved, by using https, but wanted to raise this so it will be solved in future. (atleast stop failing by fixing mark and reset issue)

thisarattr avatar Sep 05 '18 01:09 thisarattr

@thisarattr Unfortunately there's no way around this as the SDK needs to consume the full contents of the stream (which in this case requires buffering the stream to memory) to be able to set the checksum as part of the request signature. The easiest way around this would be to switch to using an HTTPS endpoint if possible.

dagnir avatar Sep 14 '18 17:09 dagnir

By the way, I know this will be solved, by using https, but wanted to raise this so it will be solved in future. (atleast stop failing by fixing mark and reset issue)

It sounds like this is a feature request so I'll mark it as such for now, but I'm not sure how we'll be able to avoid this.

dagnir avatar Sep 14 '18 18:09 dagnir

@dagnir I agree that, when it uses http there is no way to calculate the hash/checksum without buffering in memory. But still, it should not fail by throwing mark and reset exception, right?

Because, hashing is client lib responsibility, api consumer does not need to know about it. It should throw meaningful error message instead of mark and reset exception, which does not mean much to the consumer, without looking at the client lib code.

thisarattr avatar Sep 17 '18 04:09 thisarattr

Okay I see; we can certainly throw/log a more descriptive error message.

dagnir avatar Sep 17 '18 17:09 dagnir

Could we actually have a specific subclass of SdkClientException for these retryable signing/hashing problems? The Hadoop S3A client already splits failures into those which may be recoverable (no response, throttle errors, socket timeouts etc and then decides which to retry.

steveloughran avatar Jun 07 '19 10:06 steveloughran

We are closing stale v1 issues before going into Maintenance Mode.

If this issue is still relevant in v2 please open a new issue in the v2 repo.

Reference:

  • Announcing end-of-support for AWS SDK for Java v1.x effective December 31, 2025 - blog post

debora-ito avatar Jul 29 '24 22:07 debora-ito

This issue is now closed.

Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

github-actions[bot] avatar Jul 29 '24 22:07 github-actions[bot]

FYI as HADOOP-19221 shows, v2 SDK actually makes things worse in terms of s3 upload recoverability.

steveloughran avatar Aug 01 '24 19:08 steveloughran