spring-cloud-aws icon indicating copy to clipboard operation
spring-cloud-aws copied to clipboard

Spring container not shutting down after test

Open jasonab opened this issue 3 years ago • 1 comments

Type: Bug

Component: SQS

Describe the bug Running the SQS listener, with Localstack to simulate the queue, the Spring container is not shutting down cleanly when running tests through Maven. This manifests itself in various ways, but my main evidence is that I see these log messages at the end of my test run:

01:44:42.922 [SpringContextShutdownHook] INFO o.s.b.a.ApplicationAvailabilityBean - Application availability state ReadinessState changed from ACCEPTING_TRAFFIC to REFUSING_TRAFFIC 01:44:42.943 [SpringContextShutdownHook] INFO o.s.b.a.ApplicationAvailabilityBean - Application availability state ReadinessState changed from ACCEPTING_TRAFFIC to REFUSING_TRAFFIC 01:44:42.945 [SpringContextShutdownHook] INFO o.s.b.a.ApplicationAvailabilityBean - Application availability state ReadinessState changed from ACCEPTING_TRAFFIC to REFUSING_TRAFFIC

It appears that the containers build up between test classes, and interfere with each other, intercepting messages from each other's queues. I have tried making sure that immediate shutdown is set on Spring, and tweaking the various timeout values, but cannot find a sweet spot to encourage the SQS classes to die aggressively.

jasonab avatar May 27 '21 01:05 jasonab

To verify if still valid in 3.0

maciejwalkowiak avatar Sep 02 '22 13:09 maciejwalkowiak

Hi everyone, I was running into the same issue with spring-cloud-aws-starter-sqs 3.0.0-M3. While the tests following the SQS integration test were executing I could see a lot of:

13:26:03.196 96620 ERROR sdk-async-response-3-7 i.a.c.s.l.s.AbstractPollingMessageSource - Error polling for messages in queue braze-currents-notifications
java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connection refused: no further information: /127.0.0.1:51431
	at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:62) ~[utils-2.18.28.jar:na]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51) ~[sdk-core-2.18.28.jar:na]
	at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[na:na]
	at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76) ~[utils-2.18.28.jar:na]
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[na:na]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeAttemptExecute(AsyncRetryableStage.java:103) ~[sdk-core-2.18.28.jar:na]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:184) ~[sdk-core-2.18.28.jar:na]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159) ~[sdk-core-2.18.28.jar:na]
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[na:na]
	at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:76) ~[utils-2.18.28.jar:na]
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[na:na]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:103) ~[sdk-core-2.18.28.jar:na]
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[na:na]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:165) ~[sdk-core-2.18.28.jar:na]
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connection refused: no further information: /127.0.0.1:51431
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111) ~[sdk-core-2.18.28.jar:na]
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47) ~[sdk-core-2.18.28.jar:na]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223) ~[sdk-core-2.18.28.jar:na]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:218) ~[sdk-core-2.18.28.jar:na]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:182) ~[sdk-core-2.18.28.jar:na]
	... 22 common frames omitted
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: /127.0.0.1:51431
Caused by: java.net.ConnectException: Connection refused: no further information
	at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[na:na]
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[na:na]
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[na:na]
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.85.Final.jar:4.1.85.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.85.Final.jar:4.1.85.Final]
	at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]

Additionally, my pipeline was getting OOM. My current workaround is to annotate the integration test with @DirtiesContext(classMode=DirtiesContext.ClassMode.AFTER_CLASS). With this annotation I could see at the end of the test the container being shut down:

13:47:05.507 60165 INFO  lifecycle-thread-3 i.a.c.s.l.AbstractMessageListenerContainer - Container io.awspring.cloud.sqs.sqsListenerEndpointContainer#0 stopped

Setting spring.cloud.aws.sqs.enabled to false for all the other tests didn't help either.

rbraeunlich avatar Dec 09 '22 12:12 rbraeunlich

Hi @rbraeunlich,

Well, the connection error itself is not that hard to figure - probably the Spring Context remains open after the LocalStack Container is stopped, and you get the errors.

For tests, it seems the context is not stopped with every test unless you add a @DirtiesContext annotation. Another way would be making sure the LocalStack Container is only stopped after all tests are over.

I'm not sure how to improve on this - the containers start and stop with the context which is the right way to go in apps. I think maybe it should be an improvement in the coordination between Spring Contexts and the dependency Docker Containers, but that's not in the scope of this project.

The OOM error seems a bit more concerning - can you share any more details on that?

Thanks.

tomazfernandes avatar Dec 15 '22 02:12 tomazfernandes

Unfortunately not, the pipeline is not very verbose about this. What I can say is that our service has a lot of integration tests. If(f) we assume that every new test context starts a new container and doesn't shut it down the OOM seems logical somehow.

rbraeunlich avatar Dec 15 '22 12:12 rbraeunlich

@rbraeunlich a sample that reproduces this issue would be helpful

maciejwalkowiak avatar Apr 18 '23 04:04 maciejwalkowiak

Hey @maciejwalkowiak I hope this project helps: https://github.com/rbraeunlich/sqs-bug When you run the tests the ZLongRunningOtherTest should run as second test and you should see the exceptions of the still running container.

rbraeunlich avatar Apr 19 '23 05:04 rbraeunlich

I think this is hardly an issue with this project, and much more an issue with Spring Boot tests and Testcontainers in general. Not sure if any improvements were made with latest SB / Testcontainers integration.

Fact is, if containers are stopped between test suites (as they normally should), but Spring Context remains open for reuse, it will throw errors.

Using @DirtiesContext helps context be as disposable as the containers.

I'll close this issue and we can reopen if necessary. Thanks.

tomazfernandes avatar Jan 05 '24 01:01 tomazfernandes