aws-sdk-java-v2 icon indicating copy to clipboard operation
aws-sdk-java-v2 copied to clipboard

Caused by: org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 1,226,000,000; received: 7,264,616)

Open hhtqaq opened this issue 1 year ago • 8 comments

Describe the bug

When I download a large file through file streaming, parse the data and write it to DB, this exception occurs within 5 minutes Caused by: org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 1,226,000,000; received: 7,264,616)

Expected Behavior

Write all data to DB normally

Current Behavior

When I download a large file through file streaming, parse the data and write it to DB (it takes about 5 seconds),this exception occurs within 5 minutes

Reproduction Steps

    AwsBasicCredentials awsCreds = AwsBasicCredentials.create(
        "your-access-key-id", 
        "your-secret-access-key"
    );


    S3Client s3Client = S3Client.builder()
            .region(Region.US_EAST_1) 
            .credentialsProvider(StaticCredentialsProvider.create(awsCreds))
            .build();


    GetObjectRequest getObjectRequest = GetObjectRequest.builder()
            .bucket("your-bucket-name")  
            .key("your-object-key")      
            .build();


    try (ResponseInputStream<GetObjectResponse> s3ObjectInputStream = s3Client.getObject(getObjectRequest);
         BufferedReader reader = new BufferedReader(new InputStreamReader(s3ObjectInputStream))) {

        String line;
        while ((line = reader.readLine()) != null) {
          //parse the data and write it to DB
           parseDataThenWrite2DB(line)
        }

    } catch (Exception e) {
        e.printStackTrace();
    }

    // 关闭客户端
    s3Client.close();
}

}

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.24.13

JDK version used

openjdk version "1.8.0_332" OpenJDK Runtime Environment Corretto-8.332.08.1 (build 1.8.0_332-b08) OpenJDK 64-Bit Server VM Corretto-8.332.08.1 (build 25.332-b08, mixed mode)

Operating System and version

mac os 12.4

hhtqaq avatar Aug 28 '24 03:08 hhtqaq

@hhtqaq can you share the full stacktrace?

debora-ito avatar Aug 28 '24 17:08 debora-ito

@debora-ito okkya

org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 1,226,000,000; received: 1,758,959) at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.FilterInputStream.read(FilterInputStream.java:133) at software.amazon.awssdk.core.io.SdkFilterInputStream.read(SdkFilterInputStream.java:66) at software.amazon.awssdk.core.internal.metrics.BytesReadTrackingInputStream.read(BytesReadTrackingInputStream.java:49) at java.io.FilterInputStream.read(FilterInputStream.java:133) at software.amazon.awssdk.core.io.SdkFilterInputStream.read(SdkFilterInputStream.java:66) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389)

hhtqaq avatar Aug 29 '24 02:08 hhtqaq

hi , I have encountered the same problem. Have you solved it?

yudoutingle avatar Aug 29 '24 15:08 yudoutingle

Premature end of Content-Length delimited message body indicates that the connection got closed before the SDK was able to receive all data. The SDK itself does not close connections that are in progress, so the connection could be closed by the service.

If your data parsing process is done while reading the stream content and is contributing to the 5 minutes running time, we suggest to read all the data from stream first and then process it. We have this and other best practices listed in our Developer Guide: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/best-practices.html#bestpractice2

Also, make sure nothing in your application or your environment is unexpectedly closing the connection or the client.

debora-ito avatar Aug 30 '24 22:08 debora-ito

@debora-ito hihi good morning!Can you please try it in your environment? Because when my data processing time is 1-2s, he will not have this problem? And the server does not set relevant timeout settings.

hhtqaq avatar Sep 02 '24 01:09 hhtqaq

Afraid I have to agree with @debora-ito : this is generally a sign of the network connection being broken.

The hadoop S3A client does a lot of recovery here; afraid you'll have to do something similar with a wrapping InputStream which can do the recovery, such as by

  1. reading in fixed size blocks, maybe with prefetching and parallel reads.
  2. recognising the premature EOF exception and triggering a GET from the current location to the end of the file.

In either case, you also need to make sure the connection which failed is not returned to the pool of available HTTPS connections. Call abort() on the stream to do this.

steveloughran avatar Sep 04 '24 10:09 steveloughran

@hhtqaq let us know if you have any other question.

debora-ito avatar Sep 06 '24 21:09 debora-ito

@debora-ito Can you guys try it out? Download a large file stream, then parse each line of data, sleep 5s

hhtqaq avatar Sep 09 '24 02:09 hhtqaq

@hhtqaq see the guidelines in my previous comment and let us know if you have more questions. Marking this to auto close soon.

debora-ito avatar Nov 19 '24 22:11 debora-ito