[helm] Pod creation fails with timeout
Helm Chart Version
0.445.3
What step the error happened?
During the Sync
Relevant information
When running a sync-job. Creating a new destination or anything that spawns a new pod the frontend complains about unknown error (HTTP 504) and The provided log appears.
I have a similar test-cluster with the exact same configuration that works just fine. And I have attempted to install a completely fresh airbyte install in a new namespace.
Running on AWS EKS if it matters.
Any suggestions on how to fix it or how i should continue debugging would be greatly appreciated!
Relevant log output
2024-08-20 09:24:38 ERROR i.a.w.l.p.h.FailureHandler(apply):39 - Pipeline Error
io.airbyte.workload.launcher.pipeline.stages.model.StageError: io.airbyte.workload.launcher.pods.KubeClientException: Failed to create pod source-file-check-b43bf659-7773-4cf5-b204-8c37bd657c20-0-izuis.
at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:46) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:38) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.$$access$$apply(Unknown Source) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:456) ~[micronaut-inject-4.5.4.jar:4.5.4]
at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:129) ~[micronaut-aop-4.5.4.jar:4.5.4]
at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.doIntercept(InstrumentInterceptorBase.kt:61) ~[io.airbyte.airbyte-metrics-metrics-lib-0.63.18.jar:?]
at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.intercept(InstrumentInterceptorBase.kt:44) ~[io.airbyte.airbyte-metrics-metrics-lib-0.63.18.jar:?]
at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:138) ~[micronaut-aop-4.5.4.jar:4.5.4]
at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.apply(Unknown Source) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:24) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2571) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2367) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:193) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.Mono.subscribe(Mono.java:4552) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.run(MonoSubscribeOn.java:126) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.scheduler.ImmediateScheduler$ImmediateSchedulerWorker.schedule(ImmediateScheduler.java:84) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.MonoSubscribeOn.subscribeOrReturn(MonoSubscribeOn.java:55) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.Mono.subscribe(Mono.java:4552) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.Mono.subscribeWith(Mono.java:4634) ~[reactor-core-3.6.8.jar:3.6.8]
at reactor.core.publisher.Mono.subscribe(Mono.java:4395) ~[reactor-core-3.6.8.jar:3.6.8]
at io.airbyte.workload.launcher.pipeline.LaunchPipeline.accept(LaunchPipeline.kt:50) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:28) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:12) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.commons.temporal.queue.QueueActivityImpl.consume(Internal.kt:87) ~[io.airbyte-airbyte-commons-temporal-core-0.63.18.jar:?]
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[?:?]
at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.common.interceptors.ActivityInboundCallsInterceptorBase.execute(ActivityInboundCallsInterceptorBase.java:39) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.opentracing.internal.OpenTracingActivityInboundCallsInterceptor.execute(OpenTracingActivityInboundCallsInterceptor.java:78) ~[temporal-opentracing-1.22.3.jar:?]
at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) ~[temporal-sdk-1.22.3.jar:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: io.airbyte.workload.launcher.pods.KubeClientException: Failed to create pod source-file-check-b43bf659-7773-4cf5-b204-8c37bd657c20-0-izuis.
at io.airbyte.workload.launcher.pods.KubePodClient.launchConnectorWithSidecar(KubePodClient.kt:287) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pods.KubePodClient.launchCheck(KubePodClient.kt:214) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:44) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:24) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
... 53 more
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [patch] for kind: [Pod] with name: [source-file-check-b43bf659-7773-4cf5-b204-8c37bd657c20-0-izuis] in namespace: [airbyte] failed.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:159) ~[kubernetes-client-api-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.lambda$patch$2(HasMetadataOperation.java:233) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:236) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:251) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.serverSideApply(BaseOperation.java:1179) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.serverSideApply(BaseOperation.java:98) ~[kubernetes-client-6.12.1.jar:?]
at io.airbyte.workload.launcher.pods.KubePodLauncher$create$1.invoke(KubePodLauncher.kt:57) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pods.KubePodLauncher$create$1.invoke(KubePodLauncher.kt:52) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand$lambda$0(KubePodLauncher.kt:307) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.Functions.lambda$get$0(Functions.java:46) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112) ~[failsafe-3.3.2.jar:3.3.2]
at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand(KubePodLauncher.kt:307) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pods.KubePodLauncher.create(KubePodLauncher.kt:52) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pods.KubePodClient.launchConnectorWithSidecar(KubePodClient.kt:284) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pods.KubePodClient.launchCheck(KubePodClient.kt:214) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:44) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:24) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
... 53 more
Caused by: java.io.IOException: timeout
at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:504) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:524) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handlePatch(OperationSupport.java:419) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handlePatch(OperationSupport.java:397) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handlePatch(BaseOperation.java:764) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.lambda$patch$2(HasMetadataOperation.java:231) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:236) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:251) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.serverSideApply(BaseOperation.java:1179) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.serverSideApply(BaseOperation.java:98) ~[kubernetes-client-6.12.1.jar:?]
at io.airbyte.workload.launcher.pods.KubePodLauncher$create$1.invoke(KubePodLauncher.kt:57) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pods.KubePodLauncher$create$1.invoke(KubePodLauncher.kt:52) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand$lambda$0(KubePodLauncher.kt:307) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.Functions.lambda$get$0(Functions.java:46) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112) ~[failsafe-3.3.2.jar:3.3.2]
at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand(KubePodLauncher.kt:307) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pods.KubePodLauncher.create(KubePodLauncher.kt:52) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pods.KubePodClient.launchConnectorWithSidecar(KubePodClient.kt:284) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pods.KubePodClient.launchCheck(KubePodClient.kt:214) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:44) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:24) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42) ~[io.airbyte-airbyte-workload-launcher-0.63.18.jar:?]
... 53 more
Caused by: java.io.InterruptedIOException: timeout
at okhttp3.internal.connection.RealCall.timeoutExit(RealCall.kt:398) ~[okhttp-4.12.0.jar:?]
at okhttp3.internal.connection.RealCall.callDone(RealCall.kt:360) ~[okhttp-4.12.0.jar:?]
at okhttp3.internal.connection.RealCall.noMoreExchanges$okhttp(RealCall.kt:325) ~[okhttp-4.12.0.jar:?]
at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:209) ~[okhttp-4.12.0.jar:?]
at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517) ~[okhttp-4.12.0.jar:?]
... 3 more
Caused by: java.io.IOException: Canceled
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:72) ~[okhttp-4.12.0.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) ~[okhttp-4.12.0.jar:?]
at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201) ~[okhttp-4.12.0.jar:?]
at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517) ~[okhttp-4.12.0.jar:?]
... 3 more
2024-08-20 09:24:38 INFO i.a.w.l.c.WorkloadApiClient(updateStatusToFailed):54 - Attempting to update workload: 778daa7c-feaf-4db6-96f3-70fd645acc77_b43bf659-7773-4cf5-b204-8c37bd657c20_0_check to FAILED.
2024-08-20 09:24:38 INFO i.a.w.l.p.h.FailureHandler(apply):62 - Pipeline aborted after error for workload: 778daa7c-feaf-4db6-96f3-70fd645acc77_b43bf659-7773-4cf5-b204-8c37bd657c20_0_check.
After some investigation i figured out the problem goes away if i add a rule to our security group that allows all tcp traffic from control-plane to worker nodes.
Not sure why it is needed or why it worked without previously, but this seems to solve the issue consistently for us for now. Is there a specific port that is needed?
@davinchia can you check if this issue?
@jnatten strange. Does your cluster have special security rules set up? We run Airbyte Cloud on EKS and have never seen this issue.
Not sure if they are special, but the previous security group setup were something like this:
Worker node -> Cluster: 443 Worker node -> Worker node 53,1025 - 65535 Cluster -> worker node: 443,4443,6443,8443,9443,10250 Worker node -> outside world: all open
Think all of it is from the terraform eks module, but i could be wrong on that.
After allowing all ports from cluster -> worker nodes it started working. Not sure if we need all or just some, but i don't think its an issue for us to keep them open.
Happened to me after trying to upgrade a cluster. Had to helm uninstall and re-install and then it worked fine.
At Airbyte, we seek to be clear about the project priorities and roadmap. This issue has not had any activity for 180 days, suggesting that it's not as critical as others. It's possible it has already been fixed. It is being marked as stale and will be closed in 20 days if there is no activity. To keep it open, please comment to let us know why it is important to you and if it is still reproducible on recent versions of Airbyte.
This issue was closed because it has been inactive for 20 days since being marked as stale.