airbyte
airbyte copied to clipboard
Infra error for CheckConnectionWorkflow
What method are you using to run Airbyte?
Kubernetes
Platform Version or Helm Chart Version
0.45.22
What step the error happened?
Other
Revelant information
Hi,
After trying to add a new Source (BigQuery) on UI an error show up : "Configuration Check Failed, Failed to run connection tests." without any logs associated.
- New Source
- Set up a new source
- Fill in BigQuery credentials
- Set up source
- Error : "Configuration Check Failed, Failed to run connection tests."
- no logs
I suspect the following infra log to be responsible for the issue (CheckConnectionWorkflow) (airbyte-worker) and it seems to be related to Kubernetes infra. (Caused by: io.temporal.failure.ApplicationFailure: message='Cannot invoke "io.fabric8.kubernetes.api.model.Pod.getMetadata()" because "this.podDefinition" is null', type='java.lang.NullPointerException', nonRetryable=false)
Any idea ?
thx
Relevant log output
WARN i.t.i.s.WorkflowExecuteRunnable(throwAndFailWorkflowExecution):134 - Workflow execution failure WorkflowId='dc860043-f767-4d59-ba5d-e13dfdcc8fa1', RunId=a7be635b-4416-469d-9df4-cb9dd3395ee3, WorkflowType='CheckConnectionWorkflow'
67 io.temporal.failure.ActivityFailure: scheduledEventId=6, startedEventId=7, activityType='RunWithJobOutput', activityId='bb21450a-e97a-3685-95b2-6fe104bfc690', identity='1@airbyte-worker-798fb8f58-kvlnq', retryState=RETRY_STATE_MAXIMUM_ATTEMPTS_REACHED
66 java.lang.Thread.getStackTrace(Thread.java:2550) ~[?:?]
65 io.temporal.internal.sync.ActivityStubBase.execute(ActivityStubBase.java:49) ~[temporal-sdk-1.17.0.jar:?]
64 io.temporal.internal.sync.ActivityInvocationHandler.lambda$getActivityFunc$0(ActivityInvocationHandler.java:78) ~[temporal-sdk-1.17.0.jar:?]
63 io.temporal.internal.sync.ActivityInvocationHandlerBase.invoke(ActivityInvocationHandlerBase.java:60) ~[temporal-sdk-1.17.0.jar:?]
62 jdk.proxy2.$Proxy91.runWithJobOutput(Unknown Source) ~[?:?]
61 io.airbyte.workers.temporal.check.connection.CheckConnectionWorkflowImpl.run(CheckConnectionWorkflowImpl.java:54) ~[io.airbyte-airbyte-workers-0.44.5.jar:?]
60 CheckConnectionWorkflowImplProxy.run$accessor$DjcL9yIf(Unknown Source) ~[?:?]
59 CheckConnectionWorkflowImplProxy$auxiliary$KVTBijyY.call(Unknown Source) ~[?:?]
58 io.airbyte.workers.temporal.support.TemporalActivityStubInterceptor.execute(TemporalActivityStubInterceptor.java:79) ~[io.airbyte-airbyte-workers-0.44.5.jar:?]
57 CheckConnectionWorkflowImplProxy.run(Unknown Source) ~[?:?]
56 jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ~[?:?]
55 java.lang.reflect.Method.invoke(Method.java:578) ~[?:?]
54 io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation$RootWorkflowInboundCallsInterceptor.execute(POJOWorkflowImplementationFactory.java:302) ~[temporal-sdk-1.17.0.jar:?]
53 io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation.execute(POJOWorkflowImplementationFactory.java:277) ~[temporal-sdk-1.17.0.jar:?]
52 io.temporal.internal.sync.WorkflowExecuteRunnable.run(WorkflowExecuteRunnable.java:71) ~[temporal-sdk-1.17.0.jar:?]
51 io.temporal.internal.sync.SyncWorkflow.lambda$start$0(SyncWorkflow.java:116) ~[temporal-sdk-1.17.0.jar:?]
50 io.temporal.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:102) ~[temporal-sdk-1.17.0.jar:?]
49 io.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:106) ~[temporal-sdk-1.17.0.jar:?]
48 io.temporal.worker.ActiveThreadReportingExecutor.lambda$submit$0(ActiveThreadReportingExecutor.java:53) ~[temporal-sdk-1.17.0.jar:?]
47 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577) ~[?:?]
46 java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
45 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
44 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
43 java.lang.Thread.run(Thread.java:1589) ~[?:?]
42 Caused by: io.temporal.failure.ApplicationFailure: message='io.airbyte.workers.exception.WorkerException: Unexpected error while getting checking connection.', type='java.util.concurrent.ExecutionException', nonRetryable=false
41 java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) ~[?:?]
40 java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) ~[?:?]
39 io.airbyte.workers.temporal.TemporalAttemptExecution.get(TemporalAttemptExecution.java:161) ~[io.airbyte-airbyte-workers-0.44.5.jar:?]
38 io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.runWithJobOutput(CheckConnectionActivityImpl.java:125) ~[io.airbyte-airbyte-workers-0.44.5.jar:?]
37 java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ~[?:?]
36 java.base/java.lang.reflect.Method.invoke(Method.java:578) ~[?:?]
35 io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.17.0.jar:?]
34 io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.17.0.jar:?]
33 io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:95) ~[temporal-sdk-1.17.0.jar:?]
32 io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:92) ~[temporal-sdk-1.17.0.jar:?]
31 io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:241) ~[temporal-sdk-1.17.0.jar:?]
30 io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:206) ~[temporal-sdk-1.17.0.jar:?]
29 io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:179) ~[temporal-sdk-1.17.0.jar:?]
28 io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93) ~[temporal-sdk-1.17.0.jar:?]
27 java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
26 java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
25 java.base/java.lang.Thread.run(Thread.java:1589) ~[?:?]
24 Caused by: io.temporal.failure.ApplicationFailure: message='Unexpected error while getting checking connection.', type='io.airbyte.workers.exception.WorkerException', nonRetryable=false
23 io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:127) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
22 io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:43) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
21 io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$5(TemporalAttemptExecution.java:195) ~[io.airbyte-airbyte-workers-0.44.5.jar:?]
20 java.base/java.lang.Thread.run(Thread.java:1589) ~[?:?]
19 Caused by: io.temporal.failure.ApplicationFailure: message='Cannot invoke "io.fabric8.kubernetes.api.model.Pod.getMetadata()" because "this.podDefinition" is null', type='io.airbyte.workers.exception.WorkerException', nonRetryable=false
18 io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:148) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
17 io.airbyte.workers.process.AirbyteIntegrationLauncher.check(AirbyteIntegrationLauncher.java:113) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
16 io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:69) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
15 io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:43) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
14 io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$5(TemporalAttemptExecution.java:195) ~[io.airbyte-airbyte-workers-0.44.5.jar:?]
13 java.base/java.lang.Thread.run(Thread.java:1589) ~[?:?]
12 Caused by: io.temporal.failure.ApplicationFailure: message='Cannot invoke "io.fabric8.kubernetes.api.model.Pod.getMetadata()" because "this.podDefinition" is null', type='java.lang.NullPointerException', nonRetryable=false
11 io.airbyte.workers.process.KubePodProcess.close(KubePodProcess.java:782) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
10 io.airbyte.workers.process.KubePodProcess.cleanup(KubePodProcess.java:700) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
9 io.airbyte.workers.process.KubePodProcess.<init>(KubePodProcess.java:632) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
8 io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:144) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
7 io.airbyte.workers.process.AirbyteIntegrationLauncher.check(AirbyteIntegrationLauncher.java:113) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
6 io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:69) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
5 io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:43) ~[io.airbyte-airbyte-commons-worker-0.44.5.jar:?]
4 io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$5(TemporalAttemptExecution.java:195) ~[io.airbyte-airbyte-workers-0.44.5.jar:?]
3 java.base/java.lang.Thread.run(Thread.java:1589) ~[?:?]
Reproduced and I also have the same error. Same setup. Any ideas?
Hi, has this been fixed? Like the OP I had the same setup (Airbyte 0.45.22 deployed to kube) and was getting the same error yesterday. I tried installing a few past versions today (which all failed for other reasons) then went back to version 0.45.22 and this time the connector check was successful. Wondering if this has been resolved in a minor patch since yesterday or whether it's a flaky success
I think it's flaky succcess @cmardiros.
I just upgraded to .22 and the issue is happening. I'll now try to downgrade > upgrade like you did.
oh well :worried:
downgrading to .20 made it work again and I won't try to upgrade again.
Same issue here.
Hi - airbyte eng here. I wasn't able to repro this issue under 0.45.22 or the latest version. If you are still experiencing this issue, can you kindly check if the check job pod has actually started? You can go to your local k8s cluster and run kubectl get pods and it should show something like rce-bigquery-check-41ccd5a2-b96c-4178-b00b-2ef0a7bde2d8-0-rcbvq.
I'd like to know
- if the pod actually has been created;
- if so, what's the k8s status of that pod?
For me, there were no pods created - which makes sense, given the error because "this.podDefinition" is null
Got it - thanks!
I think the NPE was thrown at this line since the function allows the null value as input - thus the line should check for null value before reaching metadata. While this is an easy fix, I'm more worried on disappearing pod - the pod could never be created or it could be created and shortly gets deleted. If you have time, can you check logs on worker (it might be hinting before the error logs in the original post, maybe it encountered some problems when creating the pod?)
(line throwing problem) https://github.com/airbytehq/airbyte-platform/blob/main/airbyte-commons-worker/src/main/java/io/airbyte/workers/process/KubePodProcess.java#L783
Hi, I forgot to mention we used an external postgres database !
@cmardiros
We fixed this issue by setting our external database to use a version 13.11 and this issue disappeared, we were using a more recent version of postgres (15.2) and this was causing many bugs.
It might be a good idea to indicate the supported database postgres version in the database configuration documentation !
Hello, I'm working with @cmardiros and we are still seeing this error even with postgres 13.11 db pinned with helm. We are using the following setup:
- Helm chart version:
0.45.40 - Deployment type: OSS on kubernetes (GKE with autopilot version: 1.26.5-gke.1200)
We are seeing this happen whenever we run a helm install/upgrade. I've narrowed it down to the fact that service-account is recreated and the serviceAccount token issued in the worker pod becomes invalid. It seems related to https://github.com/airbytehq/airbyte/issues/18731. We've managed to fix this by not creating the service account as part of helm.
We are also facing the same issue and this gets sorted out by restarting airbyte-temporal.
But did we get any permanent fix for it?? please help us out here. Sharing further info below:
Airbyte worker logs:
2024-05-22 08:02:46 INFO i.a.c.t.TemporalUtils(withBackgroundHeartbeat):330 - Temporal heartbeating stopped.
2024-05-22 08:02:46 WARN i.t.i.a.ActivityTaskExecutors$BaseActivityTaskExecutor(execute):114 - Activity failure. ActivityId=27ec2dc3-92af-3d2f-82b3-340b1ee0956d, activityType=RunWithJobOutput, attempt=1
java.lang.RuntimeException: io.temporal.serviceclient.CheckedExceptionWrapper: io.airbyte.workers.exception.WorkerException: Unexpected error while getting checking connection.
at io.airbyte.commons.temporal.TemporalUtils.withBackgroundHeartbeat(TemporalUtils.java:319) ~[io.airbyte-airbyte-commons-temporal-0.50.33.jar:?]
at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.runWithJobOutput(CheckConnectionActivityImpl.java:121) ~[io.airbyte-airbyte-workers-0.50.33.jar:?]
at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:578) ~[?:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:95) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:92) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:241) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:206) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:179) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93) ~[temporal-sdk-1.17.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1589) ~[?:?]
Caused by: io.temporal.serviceclient.CheckedExceptionWrapper: io.airbyte.workers.exception.WorkerException: Unexpected error while getting checking connection.
at io.temporal.serviceclient.CheckedExceptionWrapper.wrap(CheckedExceptionWrapper.java:57) ~[temporal-serviceclient-1.17.0.jar:?]
at io.temporal.internal.sync.WorkflowInternal.wrap(WorkflowInternal.java:461) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.activity.Activity.wrap(Activity.java:52) ~[temporal-sdk-1.17.0.jar:?]
at io.airbyte.workers.temporal.TemporalAttemptExecution.get(TemporalAttemptExecution.java:139) ~[io.airbyte-airbyte-workers-0.50.33.jar:?]
at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.lambda$runWithJobOutput$1(CheckConnectionActivityImpl.java:136) ~[io.airbyte-airbyte-workers-0.50.33.jar:?]
at io.airbyte.commons.temporal.TemporalUtils.withBackgroundHeartbeat(TemporalUtils.java:314) ~[io.airbyte-airbyte-commons-temporal-0.50.33.jar:?]
... 14 more
Caused by: io.airbyte.workers.exception.WorkerException: Unexpected error while getting checking connection.
at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:135) ~[io.airbyte-airbyte-commons-worker-0.50.33.jar:?]
at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:44) ~[io.airbyte-airbyte-commons-worker-0.50.33.jar:?]
at io.airbyte.workers.temporal.TemporalAttemptExecution.get(TemporalAttemptExecution.java:135) ~[io.airbyte-airbyte-workers-0.50.33.jar:?]
at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.lambda$runWithJobOutput$1(CheckConnectionActivityImpl.java:136) ~[io.airbyte-airbyte-workers-0.50.33.jar:?]
at io.airbyte.commons.temporal.TemporalUtils.withBackgroundHeartbeat(TemporalUtils.java:314) ~[io.airbyte-airbyte-commons-temporal-0.50.33.jar:?]
... 14 more
Airbyte Version: 0.50.xx External Database Postgres Version: 14.10
In my case the reason was memory limits on sync jobs. We have identified the issue, Actually sync pods were failing with exit code 137, which means they had memory pressure. (although pods were not in OOMkilled state -- nvm) Initially we had applied this resource quota on airbyte jobs:
resources:
requests:
cpu: 100m
memory: 25Mi
limits:
cpu: 200m
memory: 50Mi
We updated the resource quota to the following and our sync pods started working fine.
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: '200m'
memory: 1Gi
Insufficient resources sounds right. I'm going to close this for now.