spring-cloud-aws
spring-cloud-aws copied to clipboard
Spring container not shutting down after test
Type: Bug
Component: SQS
Describe the bug Running the SQS listener, with Localstack to simulate the queue, the Spring container is not shutting down cleanly when running tests through Maven. This manifests itself in various ways, but my main evidence is that I see these log messages at the end of my test run:
01:44:42.922 [SpringContextShutdownHook] INFO o.s.b.a.ApplicationAvailabilityBean - Application availability state ReadinessState changed from ACCEPTING_TRAFFIC to REFUSING_TRAFFIC 01:44:42.943 [SpringContextShutdownHook] INFO o.s.b.a.ApplicationAvailabilityBean - Application availability state ReadinessState changed from ACCEPTING_TRAFFIC to REFUSING_TRAFFIC 01:44:42.945 [SpringContextShutdownHook] INFO o.s.b.a.ApplicationAvailabilityBean - Application availability state ReadinessState changed from ACCEPTING_TRAFFIC to REFUSING_TRAFFIC
It appears that the containers build up between test classes, and interfere with each other, intercepting messages from each other's queues. I have tried making sure that immediate shutdown is set on Spring, and tweaking the various timeout values, but cannot find a sweet spot to encourage the SQS classes to die aggressively.
To verify if still valid in 3.0
Hi everyone, I was running into the same issue with spring-cloud-aws-starter-sqs 3.0.0-M3. While the tests following the SQS integration test were executing I could see a lot of:
13:26:03.196 96620 ERROR sdk-async-response-3-7 i.a.c.s.l.s.AbstractPollingMessageSource - Error polling for messages in queue braze-currents-notifications
java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connection refused: no further information: /127.0.0.1:51431
at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:62) ~[utils-2.18.28.jar:na]
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51) ~[sdk-core-2.18.28.jar:na]
at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[na:na]
at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76) ~[utils-2.18.28.jar:na]
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[na:na]
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeAttemptExecute(AsyncRetryableStage.java:103) ~[sdk-core-2.18.28.jar:na]
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:184) ~[sdk-core-2.18.28.jar:na]
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159) ~[sdk-core-2.18.28.jar:na]
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[na:na]
at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76) ~[utils-2.18.28.jar:na]
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[na:na]
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:103) ~[sdk-core-2.18.28.jar:na]
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[na:na]
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:165) ~[sdk-core-2.18.28.jar:na]
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connection refused: no further information: /127.0.0.1:51431
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111) ~[sdk-core-2.18.28.jar:na]
at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47) ~[sdk-core-2.18.28.jar:na]
at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223) ~[sdk-core-2.18.28.jar:na]
at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:218) ~[sdk-core-2.18.28.jar:na]
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:182) ~[sdk-core-2.18.28.jar:na]
... 22 common frames omitted
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: /127.0.0.1:51431
Caused by: java.net.ConnectException: Connection refused: no further information
at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[na:na]
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[na:na]
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[na:na]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.85.Final.jar:4.1.85.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.85.Final.jar:4.1.85.Final]
at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
Additionally, my pipeline was getting OOM.
My current workaround is to annotate the integration test with @DirtiesContext(classMode=DirtiesContext.ClassMode.AFTER_CLASS)
. With this annotation I could see at the end of the test the container being shut down:
13:47:05.507 60165 INFO lifecycle-thread-3 i.a.c.s.l.AbstractMessageListenerContainer - Container io.awspring.cloud.sqs.sqsListenerEndpointContainer#0 stopped
Setting spring.cloud.aws.sqs.enabled
to false for all the other tests didn't help either.
Hi @rbraeunlich,
Well, the connection error itself is not that hard to figure - probably the Spring Context remains open after the LocalStack Container
is stopped, and you get the errors.
For tests, it seems the context is not stopped with every test unless you add a @DirtiesContext
annotation. Another way would be making sure the LocalStack Container
is only stopped after all tests are over.
I'm not sure how to improve on this - the containers start and stop with the context which is the right way to go in apps. I think maybe it should be an improvement in the coordination between Spring Contexts and the dependency Docker Containers, but that's not in the scope of this project.
The OOM
error seems a bit more concerning - can you share any more details on that?
Thanks.
Unfortunately not, the pipeline is not very verbose about this. What I can say is that our service has a lot of integration tests. If(f) we assume that every new test context starts a new container and doesn't shut it down the OOM seems logical somehow.
@rbraeunlich a sample that reproduces this issue would be helpful
Hey @maciejwalkowiak
I hope this project helps: https://github.com/rbraeunlich/sqs-bug
When you run the tests the ZLongRunningOtherTest
should run as second test and you should see the exceptions of the still running container.
I think this is hardly an issue with this project, and much more an issue with Spring Boot tests and Testcontainers in general. Not sure if any improvements were made with latest SB / Testcontainers integration.
Fact is, if containers are stopped between test suites (as they normally should), but Spring Context remains open for reuse, it will throw errors.
Using @DirtiesContext
helps context be as disposable as the containers.
I'll close this issue and we can reopen if necessary. Thanks.