spring-cloud-aws
spring-cloud-aws copied to clipboard
S3 timing out on CI since 3.0.3 (tests using LocalStack)
Type: Bug
Component: S3
Describe the bug
Running any version of spring-cloud-aws-dependencies
higher than 3.0.2 (tested with 3.0.3 - 3.1.1) causes our Gitlab CI runner (an AWS t3.xlarge
instance) to time out on S3 operations:
software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Read timed out
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:83)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:224)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
at software.amazon.awssdk.services.s3.DefaultS3Client.createBucket(DefaultS3Client.java:1156)
at software.amazon.awssdk.services.s3.S3Client.createBucket(S3Client.java:1708)
at [...]
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 1 failure: Unable to execute HTTP request: Read timed out
Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 2 failure: Unable to execute HTTP request: Read timed out
Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 3 failure: Unable to execute HTTP request: Read timed out
Suppressed: java.lang.NullPointerException: Cannot invoke "java.util.Set.forEach(java.util.function.Consumer)" because the return value of "org.junit.jupiter.api.extension.ExtensionContext$Store.remove(Object, java.lang.Class)" is null
at org.mockito.junit.jupiter.MockitoExtension.afterEach(MockitoExtension.java:190)
While the tests usually take 6-10 min, here they take up to 43 minutes before failing the build.
We are running the tests using Maven and creating a LocalStack instance using TestContainers (1.19.7
). The relevant code is:
s3Client = S3Client.builder()
.endpointOverride(LOCAL_STACK_CONTAINER.getEndpointOverride(LocalStackContainer.Service.S3))
.credentialsProvider(StaticCredentialsProvider.create(AwsBasicCredentials.create(
LOCAL_STACK_CONTAINER.getAccessKey(), LOCAL_STACK_CONTAINER.getSecretKey())))
.region(Region.of(LOCAL_STACK_CONTAINER.getRegion()))
.build();
// ...
// It times out here (@BeforeEach):
s3Client.createBucket(request -> request.bucket(bucket));
// ...And here (@AfterEach):
s3Client.deleteBucket(request -> request.bucket(bucket));
Importantly, everything works perfectly well on a local machine (the entire build takes about 3.5min). The issues only occur on CI. Furthermore, the issue does not occur on version 3.0.2. On 3.0.2, all S3 operations take milliseconds, as expected (even on CI).
I thought that maybe it's the fault of a new version of AWS SDK, so I tried downgrading AWS SDK to the version from 3.0.2:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>bom</artifactId>
<version>2.20.63</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3-transfer-manager</artifactId>
<version>2.20.63</version>
<optional>true</optional>
</dependency>
</dependencies>
</dependencyManagement>
I found a semi-similar problem on LocalStack's Github, so I tried running LocalStack with the suggested environment variables:
DockerImageName imageName = DockerImageName.parse("localstack/localstack:latest")
.asCompatibleSubstituteFor("localstack/localstack");
return (LocalStackContainerWrapper) new LocalStackContainer(imageName)
// https://github.com/localstack/localstack/issues/10340#issuecomment-2008323237
.withEnv("GATEWAY_SERVER", "twisted")
// https://github.com/localstack/localstack/issues/10340#issuecomment-1979282202
.withEnv("PROVIDER_OVERRIDE_S3", "legacy_v2");
~None of the above fixed the problem. I tried looking through changes between 3.0.2 and 3.0.3 but I'm unable to find anything that would stand out as a likely culprit.~ (EDIT: It turns out AWS SDK is to blame after all. See comment below)
The bucket the tests are struggling to create and then remove is called test-bucket
, so I don't think any name-related problems are to blame.
At the same time, I really need to update this dependency (I need the FIFO queue name bugfix introduced in 3.1.1) and I'm at a loss what to do.
The runner instance shows only mild memory and CPU consumption, so it doesn't seem like lack of resources is the problem here. Below is a simplified version of our .gitlab-ci.yml
:
variables:
MAVEN_CLI_OPTS: >-
-gs ./ci_settings.xml
-Dmaven.repo.local=${CI_PROJECT_DIR}/.m2/repository
-Djavax.net.ssl.trustStore=./cacerts
--batch-mode
--errors
image: [...]/maven:3.9.4-amazoncorretto-21
code_quality_analysis:
before_script:
- >
export DOCKER_AUTH_CONFIG=[...]
script:
- > # Run tests
mvn clean verify $MAVEN_CLI_OPTS
-T 2C
# ...
I don't think our ci_settings.xml
proxies are to blame, but just in case here's a simplified version of those:
<settings xmlns="http://maven.apache.org/SETTINGS/1.2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.2.0 https://maven.apache.org/xsd/settings-1.2.0.xsd">
<proxies>
<proxy>
<id>[...]-http</id>
<active>true</active>
<protocol>http</protocol>
<port>8080</port>
<host>[...]</host>
<nonProxyHosts>localhost|127.0.0.1|[...]</nonProxyHosts>
</proxy>
<proxy>
<id>[...]-https</id>
<active>true</active>
<protocol>https</protocol>
<port>8080</port>
<host>[...]</host>
<nonProxyHosts>localhost|127.0.0.1|[...]</nonProxyHosts>
</proxy>
</proxies>
</settings>
In summary, ~some change~ (EDIT: the AWS SDK update) in 3.0.3
is causing our CI to time out during S3 operations ~and we can't figure out what it is~.
Sample Unfortunately, so far I was unable to reproduce this problem outside of Gitlab CI.