aws-sdk-java-v2
aws-sdk-java-v2 copied to clipboard
Caused by: org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 1,226,000,000; received: 7,264,616)
Describe the bug
When I download a large file through file streaming, parse the data and write it to DB, this exception occurs within 5 minutes Caused by: org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 1,226,000,000; received: 7,264,616)
Expected Behavior
Write all data to DB normally
Current Behavior
When I download a large file through file streaming, parse the data and write it to DB (it takes about 5 seconds),this exception occurs within 5 minutes
Reproduction Steps
AwsBasicCredentials awsCreds = AwsBasicCredentials.create(
"your-access-key-id",
"your-secret-access-key"
);
S3Client s3Client = S3Client.builder()
.region(Region.US_EAST_1)
.credentialsProvider(StaticCredentialsProvider.create(awsCreds))
.build();
GetObjectRequest getObjectRequest = GetObjectRequest.builder()
.bucket("your-bucket-name")
.key("your-object-key")
.build();
try (ResponseInputStream<GetObjectResponse> s3ObjectInputStream = s3Client.getObject(getObjectRequest);
BufferedReader reader = new BufferedReader(new InputStreamReader(s3ObjectInputStream))) {
String line;
while ((line = reader.readLine()) != null) {
//parse the data and write it to DB
parseDataThenWrite2DB(line)
}
} catch (Exception e) {
e.printStackTrace();
}
// 关闭客户端
s3Client.close();
}
}
Possible Solution
No response
Additional Information/Context
No response
AWS Java SDK version used
2.24.13
JDK version used
openjdk version "1.8.0_332" OpenJDK Runtime Environment Corretto-8.332.08.1 (build 1.8.0_332-b08) OpenJDK 64-Bit Server VM Corretto-8.332.08.1 (build 25.332-b08, mixed mode)
Operating System and version
mac os 12.4
@hhtqaq can you share the full stacktrace?
@debora-ito okkya
org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 1,226,000,000; received: 1,758,959) at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.FilterInputStream.read(FilterInputStream.java:133) at software.amazon.awssdk.core.io.SdkFilterInputStream.read(SdkFilterInputStream.java:66) at software.amazon.awssdk.core.internal.metrics.BytesReadTrackingInputStream.read(BytesReadTrackingInputStream.java:49) at java.io.FilterInputStream.read(FilterInputStream.java:133) at software.amazon.awssdk.core.io.SdkFilterInputStream.read(SdkFilterInputStream.java:66) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389)
hi , I have encountered the same problem. Have you solved it?
Premature end of Content-Length delimited message body indicates that the connection got closed before the SDK was able to receive all data. The SDK itself does not close connections that are in progress, so the connection could be closed by the service.
If your data parsing process is done while reading the stream content and is contributing to the 5 minutes running time, we suggest to read all the data from stream first and then process it. We have this and other best practices listed in our Developer Guide: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/best-practices.html#bestpractice2
Also, make sure nothing in your application or your environment is unexpectedly closing the connection or the client.
@debora-ito hihi good morning!Can you please try it in your environment? Because when my data processing time is 1-2s, he will not have this problem? And the server does not set relevant timeout settings.
Afraid I have to agree with @debora-ito : this is generally a sign of the network connection being broken.
The hadoop S3A client does a lot of recovery here; afraid you'll have to do something similar with a wrapping InputStream which can do the recovery, such as by
- reading in fixed size blocks, maybe with prefetching and parallel reads.
- recognising the premature EOF exception and triggering a GET from the current location to the end of the file.
In either case, you also need to make sure the connection which failed is not returned to the pool of available HTTPS connections. Call abort() on the stream to do this.
@hhtqaq let us know if you have any other question.
@debora-ito Can you guys try it out? Download a large file stream, then parse each line of data, sleep 5s
@hhtqaq see the guidelines in my previous comment and let us know if you have more questions. Marking this to auto close soon.