camunda icon indicating copy to clipboard operation
camunda copied to clipboard

BackupUploadIT.shouldSaveBackupWithManyFiles is flaky

Open rodrigo-lourenco-lopes opened this issue 10 months ago • 9 comments

Summary

  • How often does the test fail? - Once so far.
  • Does it block your work? - No, passed on retry.
  • Do we suspect that it is a real failure? - Not sure

Failures

This failure happened in a backport to 8.3. https://github.com/camunda/zeebe/actions/runs/8922653492/job/24505245262?pr=18123

Example assertion failure
Expecting
  
to be completed within 1M.

exception caught while trying to get the future result: java.util.concurrent.ExecutionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Duplicate handler name: HttpStreamsClientHandler#0-body-subscriber at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096) at org.assertj.core.internal.Futures.assertSucceededWithin(Futures.java:109) at org.assertj.core.api.AbstractCompletableFutureAssert.internalSucceedsWithin(AbstractCompletableFutureAssert.java:400) at org.assertj.core.api.AbstractCompletableFutureAssert.succeedsWithin(AbstractCompletableFutureAssert.java:396) at io.camunda.zeebe.backup.s3.BackupUploadIT.shouldSaveBackupWithManyFiles(BackupUploadIT.java:95) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138) at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95) at org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.compute(ForkJoinPoolHierarchicalTestExecutorService.java:202) at java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Duplicate handler name: HttpStreamsClientHandler#0-body-subscriber at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111) at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47) at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223) at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:218) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:182) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:103) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.completeResponseFuture(MakeAsyncHttpRequestStage.java:240) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:163) at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: java.lang.IllegalArgumentException: Duplicate handler name: HttpStreamsClientHandler#0-body-subscriber at io.netty.channel.DefaultChannelPipeline.checkDuplicateName(DefaultChannelPipeline.java:1055) at io.netty.channel.DefaultChannelPipeline.filterName(DefaultChannelPipeline.java:284) at io.netty.channel.DefaultChannelPipeline.addAfter(DefaultChannelPipeline.java:301) at io.netty.channel.DefaultChannelPipeline.addAfter(DefaultChannelPipeline.java:290) at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler.unbufferedWrite(HttpStreamsHandler.java:329) at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler.flushNext(HttpStreamsHandler.java:376) at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler.write(HttpStreamsHandler.java:270) at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsClientHandler.write(HttpStreamsClientHandler.java:59) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:891) at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:956) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:982) at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:950) at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:1000) at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025) at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:306) at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.writeRequest(NettyRequestExecutor.java:247) at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.makeRequest(NettyRequestExecutor.java:239) at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.lambda$makeRequestListener$10(NettyRequestExecutor.java:182) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ... 1 more

Hypotheses

Logs

Logs
LOGS

rodrigo-lourenco-lopes avatar May 02 '24 11:05 rodrigo-lourenco-lopes

Happened again here

Exception
<CompletableFuture[Failed with the following stack trace:
software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Duplicate handler name: HttpStreamsClientHandler#0-body-subscriber
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:218)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:182)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
	at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$execute$0(MakeAsyncHttpRequestStage.java:108)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.completeResponseFuture(MakeAsyncHttpRequestStage.java:255)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:167)
	at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
	at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.IllegalArgumentException: Duplicate handler name: HttpStreamsClientHandler#0-body-subscriber
	at io.netty.channel.DefaultChannelPipeline.checkDuplicateName(DefaultChannelPipeline.java:1055)
	at io.netty.channel.DefaultChannelPipeline.filterName(DefaultChannelPipeline.java:284)
	at io.netty.channel.DefaultChannelPipeline.addAfter(DefaultChannelPipeline.java:301)
	at io.netty.channel.DefaultChannelPipeline.addAfter(DefaultChannelPipeline.java:290)
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler.unbufferedWrite(HttpStreamsHandler.java:329)
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler.flushNext(HttpStreamsHandler.java:376)
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler.write(HttpStreamsHandler.java:270)
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsClientHandler.write(HttpStreamsClientHandler.java:59)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:891)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:956)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:982)
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:950)
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:1000)
	at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
	at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:306)
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.writeRequest(NettyRequestExecutor.java:245)
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.makeRequest(NettyRequestExecutor.java:237)
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.lambda$makeRequestListener$10(NettyRequestExecutor.java:181)
	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
	at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	... 1 more
]>
to be completed within 1M.

exception caught while trying to get the future result: java.util.concurrent.ExecutionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Duplicate handler name: HttpStreamsClientHandler#0-body-subscriber
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
	at org.assertj.core.internal.Futures.assertSucceededWithin(Futures.java:109)
	at org.assertj.core.api.AbstractCompletableFutureAssert.internalSucceedsWithin(AbstractCompletableFutureAssert.java:400)
	at org.assertj.core.api.AbstractCompletableFutureAssert.succeedsWithin(AbstractCompletableFutureAssert.java:396)
	at io.camunda.zeebe.backup.s3.BackupUploadIT.shouldSaveBackupWithManyFiles(BackupUploadIT.java:95)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.compute(ForkJoinPoolHierarchicalTestExecutorService.java:202)
	at java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Duplicate handler name: HttpStreamsClientHandler#0-body-subscriber
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:218)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:182)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
	at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$execute$0(MakeAsyncHttpRequestStage.java:108)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.completeResponseFuture(MakeAsyncHttpRequestStage.java:255)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:167)
	at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
	at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.IllegalArgumentException: Duplicate handler name: HttpStreamsClientHandler#0-body-subscriber
	at io.netty.channel.DefaultChannelPipeline.checkDuplicateName(DefaultChannelPipeline.java:1055)
	at io.netty.channel.DefaultChannelPipeline.filterName(DefaultChannelPipeline.java:284)
	at io.netty.channel.DefaultChannelPipeline.addAfter(DefaultChannelPipeline.java:301)
	at io.netty.channel.DefaultChannelPipeline.addAfter(DefaultChannelPipeline.java:290)
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler.unbufferedWrite(HttpStreamsHandler.java:329)
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler.flushNext(HttpStreamsHandler.java:376)
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler.write(HttpStreamsHandler.java:270)
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsClientHandler.write(HttpStreamsClientHandler.java:59)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:891)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:956)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:982)
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:950)
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:1000)
	at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
	at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:306)
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.writeRequest(NettyRequestExecutor.java:245)
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.makeRequest(NettyRequestExecutor.java:237)
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.lambda$makeRequestListener$10(NettyRequestExecutor.java:181)
	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
	at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	... 1 more

	at io.camunda.zeebe.backup.s3.BackupUploadIT.shouldSaveBackupWithManyFiles(BackupUploadIT.java:95)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)

ana-vinogradova-camunda avatar May 02 '24 14:05 ana-vinogradova-camunda

Happened again here https://github.com/camunda/zeebe/actions/runs/8925112776/job/24513039306?pr=18184

rodrigo-lourenco-lopes avatar May 02 '24 14:05 rodrigo-lourenco-lopes

also reported here https://github.com/camunda/zeebe/issues/18185

megglos avatar May 02 '24 14:05 megglos

seems relevant https://github.com/aws/aws-sdk-java-v2/issues/1903

megglos avatar May 02 '24 14:05 megglos

Also seen this multiple times: https://github.com/camunda/zeebe/actions/runs/8926375388/job/24517264052

matt-whiteman avatar May 02 '24 15:05 matt-whiteman

could easily reproduce locally, second run failed 😆 As we didn't change anything in regards to backups in a longer time I suspect this could be related to a recent aws sdk update.

Given the frequency of it's occurrence I will validate this shortly.

megglos avatar May 02 '24 16:05 megglos

rollback doesn't help, will disable the test for now to let someone else continue deeper investigation next

megglos avatar May 03 '24 09:05 megglos

Happened again twice today: https://camunda.slack.com/archives/C013MEVQ4M9/p1714671606674149 and https://camunda.slack.com/archives/C013MEVQ4M9/p1714714195395769

korthout avatar May 03 '24 10:05 korthout

disabled the test for now as it can be easily reproduced locally to continue the investigation https://github.com/camunda/zeebe/pull/18205 and to eliminate the disruption it causes

megglos avatar May 03 '24 10:05 megglos

Further findings: It is relatively easy to reproduce the test locally, by just rerunning until failure the test. (It takes on average less than 10 times on my machine).

Similar issues:

  • https://github.com/linagora/james-project/issues/4819
  • https://github.com/http4s/http4s-async-http-client/issues/1
  • https://github.com/AsyncHttpClient/async-http-client/issues/1627

Some of the discussion in these issues suggests that the problem is that the AHC offers the channel back to the pool manager as soon as the last response chunk is read, which causes a race condition against the NettySubscriber's completion. Or basically we are cycling through the available channels and uploading objects faster than these can be cleaned and reused.

Hence the error message “Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Duplicate handler name: HttpStreamsClientHandler#0-body-subscriber”.

This seems to be supported by the fact that relates to the speed which we are sending new objects. When locally testing the following things reduced the probability of of this error:

  • Reducing the number of parallel connections that we use to upload the objects.
  • Increasing the size of the objects uploaded (currently empty ones).
  • Making the test slower (with thread.sleep() for example).

None of the previous changes remove completely the occurrence of the incident, just make it rarer.

I don't know a solution for this issue since this seems to be a problem on the AsyncHttpClient side which was not solved yet. I will put this issue back on triage.

rodrigo-lourenco-lopes avatar Jun 11 '24 13:06 rodrigo-lourenco-lopes

I'm a little bit confused. Where do we use AsyncHttpClient? From what I can see, the AWS SDK uses its own client which is based on Apache or Netty, but both are still owned and maintained by AWS, so I don't know if the issues above are directly related to our own.

I did a mvn dependency:tree | grep async-http-client just to be sure we don't pull it in anywhere, and came up with nothing. Can you show me how the issues above are related?

npepinpe avatar Jun 20 '24 09:06 npepinpe

@rodrigo-lourenco-lopes - did you see my question?

npepinpe avatar Jul 04 '24 16:07 npepinpe

Sorry I got the libraries confused while I was writing, I meant to say the one from AWS.

While looking again at this issue it seems that in the linked issues they already found the origin of the problem https://github.com/aws/aws-sdk-java-v2/issues/1903#issuecomment-2249693722.

Apparently if a client sends Expect: 100-continue with Content-Length: 0, a server can send the response right away. This is specified in https://datatracker.ietf.org/doc/html/rfc7231#section-5.1.1

In this specific test we were sending 4 000 empty files, we are getting the response immediately and releasing the channels to the pool before these are complete, which makes the probability of using the same channel twice quite high.

I tried to run the same test without using empty files (16 Bytes for example) and the issue no longer happens (tested locally for more than 300 runs, while without the fix, it fails every 5 or 10 runs). I will create a pull request for this.

rodrigo-lourenco-lopes avatar Jul 31 '24 09:07 rodrigo-lourenco-lopes