aws-sdk-java-v2
aws-sdk-java-v2 copied to clipboard
S3 timing out on CI (tests using LocalStack)
Describe the bug
AWS SDK 2.21.12+
causes our Gitlab CI runner (an AWS t3.xlarge
instance) to time out on S3 operations. While the tests usually take 6-10 min, here they take up to 43 minutes before failing the build.
Importantly, everything works perfectly well on a local machine (the entire build takes about 3.5min). The issues only occur on CI. Furthermore, the issue does not occur on version 2.20.63
. On 2.20.63
, all S3 operations take milliseconds, as expected (even on CI).
Expected Behavior
S3 operations should not time out during CI builds.
Current Behavior
S3 operations time out during CI builds.
software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Read timed out
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:83)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:224)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
at software.amazon.awssdk.services.s3.DefaultS3Client.createBucket(DefaultS3Client.java:1156)
at software.amazon.awssdk.services.s3.S3Client.createBucket(S3Client.java:1708)
at [...]
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 1 failure: Unable to execute HTTP request: Read timed out
Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 2 failure: Unable to execute HTTP request: Read timed out
Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 3 failure: Unable to execute HTTP request: Read timed out
Suppressed: java.lang.NullPointerException: Cannot invoke "java.util.Set.forEach(java.util.function.Consumer)" because the return value of "org.junit.jupiter.api.extension.ExtensionContext$Store.remove(Object, java.lang.Class)" is null
at org.mockito.junit.jupiter.MockitoExtension.afterEach(MockitoExtension.java:190)
Reproduction Steps
We are running the tests using Maven and creating a LocalStack instance using TestContainers (1.19.7
). The relevant code is:
s3Client = S3Client.builder()
.endpointOverride(LOCAL_STACK_CONTAINER.getEndpointOverride(LocalStackContainer.Service.S3))
.credentialsProvider(StaticCredentialsProvider.create(AwsBasicCredentials.create(
LOCAL_STACK_CONTAINER.getAccessKey(), LOCAL_STACK_CONTAINER.getSecretKey())))
.region(Region.of(LOCAL_STACK_CONTAINER.getRegion()))
.build();
// ...
// It times out here (@BeforeEach):
s3Client.createBucket(request -> request.bucket(bucket));
// ...And here (@AfterEach):
s3Client.deleteBucket(request -> request.bucket(bucket));
Below is a simplified version of our .gitlab-ci.yml
:
variables:
MAVEN_CLI_OPTS: >-
-gs ./ci_settings.xml
-Dmaven.repo.local=${CI_PROJECT_DIR}/.m2/repository
-Djavax.net.ssl.trustStore=./cacerts
--batch-mode
--errors
image: [...]/maven:3.9.4-amazoncorretto-21
code_quality_analysis:
before_script:
- >
export DOCKER_AUTH_CONFIG=[...]
script:
- > # Run tests
mvn clean verify $MAVEN_CLI_OPTS
-T 2C
# ...
I don't think our ci_settings.xml
proxies are to blame, but just in case here's a simplified version of those:
<settings xmlns="http://maven.apache.org/SETTINGS/1.2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.2.0 https://maven.apache.org/xsd/settings-1.2.0.xsd">
<proxies>
<proxy>
<id>[...]-http</id>
<active>true</active>
<protocol>http</protocol>
<port>8080</port>
<host>[...]</host>
<nonProxyHosts>localhost|127.0.0.1|[...]</nonProxyHosts>
</proxy>
<proxy>
<id>[...]-https</id>
<active>true</active>
<protocol>https</protocol>
<port>8080</port>
<host>[...]</host>
<nonProxyHosts>localhost|127.0.0.1|[...]</nonProxyHosts>
</proxy>
</proxies>
</settings>
Possible Solution
it seems the issue was introduced in AWS SDK somewhere between 2.20.63
and 2.21.12
. I would assume some code introduced between those version is the reason for this issue.
Additional Information/Context
I noticed the bug when upgrading spring-cloud-aws-dependencies
from 3.0.1
to 3.0.3+
.
spring-cloud-aws-dependencies:3.0.1
depends on AWS SDK 2.20.63
, while spring-cloud-aws-dependencies:3.0.3
depends on 2.21.12
.
I noticed the bug appears on spring-cloud-aws-dependencies:3.0.3
, but disappears when I manually downgrade AWS SDK to 2.21.12
. I tried upgrading AWS SDK to 2.25.17
, but the problem persists. Therefore, it seems the issue was introduced in AWS SDK somewhere between 2.20.63
and 2.21.12
.
The runner instance shows only mild memory and CPU consumption, so it doesn't seem like lack of resources is the problem here.
AWS Java SDK version used
2.21.12
JDK version used
maven:3.9.4-amazoncorretto-21 (docker image)
Operating System and version
Alpine Linux 3.16