airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

[blocker] Airbyte sync job failed after upgrade

Open sivankumar86 opened this issue 1 year ago • 39 comments
trafficstars

Helm Chart Version

0.94.x

What step the error happened?

During the Sync

Relevant information

Sync jobs are getting failed after upgrade to 0.61.x.

P.S : we are using custom connector.

Relevant log output

ontainer-orchestrator:0.61.0, pullPolicy=IfNotPresent]]...
2024-06-03 04:42:22 replication-orchestrator > sourceLauncherConfig is: io.airbyte.persistence.job.models.IntegrationLauncherConfig@2480acc3[jobId=639594,attemptId=0,connectionId=8597f9d9-f203-4e2d-be23-3a5fa2eb54e0,workspaceId=c810ba10-3e93-4c4c-976f-8605746e4520,dockerImage=zipau-docker.jfrog.io/source-mssql:3.1.2,normalizationDockerImage=<null>,supportsDbt=false,normalizationIntegrationType=<null>,protocolVersion=Version{version='0.2.0', major='0', minor='2', patch='0'},isCustomConnector=true,allowedHosts=<null>,additionalEnvironmentVariables=<null>,additionalLabels=<null>,priority=<null>,additionalProperties={}]
2024-06-03 04:42:22 ERROR i.a.c.Application(run):80 - Killing orchestrator because of an Exception
java.lang.IllegalStateException: baseUrl is invalid.
	at io.airbyte.api.client.generated.SourceApi.getSourceWithHttpInfo(SourceApi.kt:3690) ~[io.airbyte-airbyte-api-0.61.0.jar:?]
	at io.airbyte.api.client.generated.SourceApi.getSource(SourceApi.kt:653) ~[io.airbyte-airbyte-api-0.61.0.jar:?]
	at io.airbyte.workers.general.ReplicationWorkerFactory.create(ReplicationWorkerFactory.java:149) ~[io.airbyte-airbyte-commons-worker-0.61.0.jar:?]
	at io.airbyte.container_orchestrator.orchestrator.ReplicationJobOrchestrator.runJob(ReplicationJobOrchestrator.java:118) ~[io.airbyte-airbyte-container-orchestrator-0.61.0.jar:?]
	at io.airbyte.container_orchestrator.Application.run(Application.java:78) [io.airbyte-airbyte-container-orchestrator-0.61.0.jar:?]
	at io.airbyte.container_orchestrator.Application.main(Application.java:38) [io.airbyte-airbyte-container-orchestrator-0.61.0.jar:?]
2024-06-03 04:42:22 INFO i.a.c.AsyncStateManager(write):51 - Writing async status FAILED for KubePodInfo[namespace=product-analytics, name=orchestrator-repl-job-639594-attempt-0, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:0.61.0, pullPolicy=IfNotPresent]]...
2024-06-03 04:42:22 INFO i.a.a.SegmentAnalyticsClient(close):221 - Closing Segment analytics client...
2024-06-03 04:42:22 INFO i.a.a.BlockingShutdownAnalyticsPlugin(waitForFlush):276 - Waiting for Segment analytic client to flush enqueued messages...
2024-06-03 04:42:22 INFO i.a.a.BlockingShutdownAnalyticsPlugin(waitForFlush):288 - Segment analytic client flush complete.
2024-06-03 04:42:22 INFO i.a.a.SegmentAnalyticsClient(close):225 - Segment analytics client closed.  No new events will be accepted.
2024-06-03 04:42:22 INFO i.m.r.Micronaut(lambda$start$0):117 - Embedded Application shutting down
2024-06-03 04:42:22 WARN c.v.l.l.Log4j2Appender(close):108 - Already shutting down. Cannot remove shutdown hook.
2024-06-03 04:42:22 WARN c.v.l.l.Log4j2Appender(close):108 - Already shutting down. Cannot remove shutdown hook.

sivankumar86 avatar Jun 03 '24 06:06 sivankumar86

same issue here. I'm trying to add "global.airbyteUrl" in the helm chart (but I'm hitting another new weird issue now)

jgournet avatar Jun 03 '24 06:06 jgournet

just in case: after sooo many different issues, I uninstalled the helm chart and re-installed again (with the addition of "global.airbyteUrl") and it seems to be working now

not a great experience, but after some long hours, I managed to launch a sync

jgournet avatar Jun 03 '24 08:06 jgournet

What is global.airbyteUrl? Is it an environment variable?

anmol1vw13 avatar Jun 03 '24 08:06 anmol1vw13

What is global.airbyteUrl? Is it an environment variable?

If you are using the helm chart, it's a new value: https://github.com/airbytehq/airbyte-platform/blob/main/charts/airbyte/values.yaml#L22

jgournet avatar Jun 03 '24 08:06 jgournet

I'm deploying airbyte on an ec2 instance. What do you suggest I should be doing to resolve this error?

anmol1vw13 avatar Jun 03 '24 08:06 anmol1vw13

I'm deploying airbyte on an ec2 instance. What do you suggest I should be doing to resolve this error?

sorry, I have no idea - I'm just someone who had the same issue as you and "resolved" by unistalling the helm chart and re-installing with that modification

jgournet avatar Jun 03 '24 09:06 jgournet

Okay, thanks @jgournet

anmol1vw13 avatar Jun 03 '24 09:06 anmol1vw13

I have tried all the options. used ingress url and service url. uninstall and reinstall the helm chart but, no luck.

sivankumar86 avatar Jun 03 '24 22:06 sivankumar86

@marcosmarxm Could you shed some light on this ?

sivankumar86 avatar Jun 03 '24 22:06 sivankumar86

Same here. We are using Airbyte inside private network. Specifying external private domain or internal Kubernetes domain both not working (504 error during sync try), unsetting global.airbyteUrl - sync failure with above error.

alexremn avatar Jun 03 '24 23:06 alexremn

Same here.

Tried multiple chart versions with app version 0.61.x and issue persists in all of them - this is really frustrating because it seems we cannot rely on the released helm charts to plan our upgrades, since we never know when they are going to break.

pmspeixoto avatar Jun 03 '24 23:06 pmspeixoto

Ok, was finally able to fix it in my setup by:

  1. Upgrading to most recent chart
  2. Removing completely the global.airbyteUrl from my values.yaml configuration
  3. Manually changing the value of the INTERNAL_API_HOST environment variable in the airbyte-airbyte-env config map to remove the http:// prefix - please refer to yet another issue where this env variable issue was raised

After doing the above I did a manual rollout restart for the temporal, worker and server components and I was finally able to trigger the syncs.

Would love to have better description on what is being released in each helm chart version, plus a better description on what env variables are being added and why (still do not get what the global.airbyteUrl variable is used for)

pmspeixoto avatar Jun 04 '24 00:06 pmspeixoto

Hey, That is really hard to get into the helm chart configuration for sure 😅

@pmspeixoto global.airbyteUrl seems to be in use only for the airbyte-airbyte-env ConfigMap, and populate the following env variables :

EDIT I fixed almost everything rolling back to 0.87.4 chart version, which embed the 0.60.1 airbyte version.

firehist avatar Jun 04 '24 06:06 firehist

So, after some time running with the above setup, and triggering successful syncs for most of our connections, we started again to stumble into the below error.

2024-06-04 09:40:50 ERROR i.a.c.Application(run):80 - Killing orchestrator because of an Exception
java.lang.IllegalStateException: baseUrl is invalid.
        at io.airbyte.api.client.generated.SourceApi.getSourceWithHttpInfo(SourceApi.kt:3690) ~[io.airbyte-airbyte-api-0.61.0.jar:?]
        at io.airbyte.api.client.generated.SourceApi.getSource(SourceApi.kt:653) ~[io.airbyte-airbyte-api-0.61.0.jar:?]
        at io.airbyte.workers.general.ReplicationWorkerFactory.create(ReplicationWorkerFactory.java:149) ~[io.airbyte-airbyte-commons-worker-0.61.0.jar:?]
        at io.airbyte.container_orchestrator.orchestrator.ReplicationJobOrchestrator.runJob(ReplicationJobOrchestrator.java:118) ~[io.airbyte-airbyte-container-orchestrator-0.61.0.jar:?]
        at io.airbyte.container_orchestrator.Application.run(Application.java:78) [io.airbyte-airbyte-container-orchestrator-0.61.0.jar:?]
        at io.airbyte.container_orchestrator.Application.main(Application.java:38) [io.airbyte-airbyte-container-orchestrator-0.61.0.jar:?]

This is really strange since we did no configuration changes after the above, not even sure on what is the baseUrl referred.

@firehist I think I'll probably just rollback to that version. Did you had to do any manual changes to rollback the database state? Or is the rollback smooth? Thank you in advance!

pmspeixoto avatar Jun 04 '24 09:06 pmspeixoto

I am having this error on Schema discovery. I do not use helm charts or kubernetes. Just running on ec2 instance.

yzislin avatar Jun 04 '24 10:06 yzislin

downgrading to 0.60.1 fixes this error for me. I also had a problem with named volumes versus local host path. Had to revert docker-compose to not using named volume for local root. That was a pain for me as well

yzislin avatar Jun 04 '24 12:06 yzislin

Yep, I've reverted to 0.60.1 and it seems more stable fortunately.

pmspeixoto avatar Jun 04 '24 13:06 pmspeixoto

Folks sorry the delay I'd saw the issue now. I'm escalating this asap to the deployment team.

marcosmarxm avatar Jun 04 '24 14:06 marcosmarxm

@sivankumar86 and all, please provide any information about what steps you're doing to get into this error. What files you're changing, from what version and what commands are you running.

marcosmarxm avatar Jun 04 '24 20:06 marcosmarxm

@marcosmarxm applying update from 0.92.9 (last working version for me) to any newer version (0.61.1) w/o changing config and all jobs start to fail complaining about:

  1. baseUrl
  2. api host (another bug, discussed in separate issue)

alexremn avatar Jun 04 '24 22:06 alexremn

So latest release (0.62.0) still has this problem? I did not see this issue in the release notes

yzislin avatar Jun 05 '24 13:06 yzislin

After upgrading to 0.62 in k8s env, I have troubles with sync too. For example,

  1. Errors related to MongoDB cluster at PostgreSQL to BigQuery connector
  2. Starting sync with success result with no data transfer Screenshot 2024-06-05 at 16 46 08 Screenshot 2024-06-06 at 11 41 27

errors example from server

StatsAggregationHelper(lambda$hydrateWithAggregatedStats$6):197 - No stats have been persisted for job

you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword

2024-06-06 08:00:14 ERROR i.a.c.c.ConfigReplacer(getAllowedHosts):93 - All allowedHosts values are un-replaced.  Check this connector's configuration or actor definition -
 [${connection_string}] 

worker logs:

2024-06-06 09:18:12 platform > State Store reports orchestrator pod orchestrator-repl-job-82-attempt-0 succeeded                                                           
2024-06-06 09:18:12 INFO i.a.w.t.s.ReplicationActivityImpl(lambda$replicateV2$3):207 - sync summary: io.airbyte.config.StandardSyncOutput@7e74007c[standardSyncSummary=io.a
    at io.airbyte.workers.general.ReplicationWorkerHelper.startDestination(ReplicationWorkerHelper.kt:211)                                                                 
    at io.airbyte.workers.general.BufferedReplicationWorker.lambda$run$0(BufferedReplicationWorker.java:170)                                                               
    at io.airbyte.workers.general.BufferedReplicationWorker.lambda$runAsync$2(BufferedReplicationWorker.java:235)                                                          
    at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)                                                                          
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)                                                                           
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)                                                                           
    at java.base/java.lang.Thread.run(Thread.java:1583)                                                                                                                    
Caused by: io.airbyte.workers.exception.WorkerException: Failed to create pod for write step                                                                               
    at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:197)                                                                                   
    at io.airbyte.workers.process.AirbyteIntegrationLauncher.write(AirbyteIntegrationLauncher.java:265)                                                                    
    at io.airbyte.workers.internal.DefaultAirbyteDestination.start(DefaultAirbyteDestination.java:110)                                                                     
    at io.airbyte.workers.general.ReplicationWorkerHelper.startDestination(ReplicationWorkerHelper.kt:209)                                                                 
    ... 6 more                                                                                                                                                             
Caused by: java.lang.RuntimeException: java.io.IOException: kubectl cp failed with exit code 1                                                                             
    at io.airbyte.workers.process.KubePodProcess.copyFilesToKubeConfigVolume(KubePodProcess.java:368)                                                                      
    at io.airbyte.workers.process.KubePodProcess.<init>(KubePodProcess.java:672)                                                                                           
    at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:193)                                                                                   
    ... 9 more                                                                                                                                                             
Caused by: java.io.IOException: kubectl cp failed with exit code 1                                                                                                         
    at io.airbyte.workers.process.KubePodProcess.copyFilesToKubeConfigVolume(KubePodProcess.java:362)                                                                      
    ... 11 more 

ivan-sukhomlyn avatar Jun 06 '24 11:06 ivan-sukhomlyn

@ivan-sukhomlyn from what version did you upgrade from?

marcosmarxm avatar Jun 06 '24 13:06 marcosmarxm

@marcosmarxm thanks for the reply! I upgraded from 0.60.1 to 0.61.0 using the 0.94.1 Helm chart version and later. I've also tried the latest 0.62.0 versions, and the result is the same: unstable behavior.

ivan-sukhomlyn avatar Jun 06 '24 14:06 ivan-sukhomlyn

I'm using docker-compose and got the same error:

Caused by: io.temporal.failure.ApplicationFailure: message='baseUrl is invalid.', type='java.lang.IllegalStateException', nonRetryable=false at io.airbyte.api.client.generated.WorkspaceApi.getWorkspaceByConnectionIdWithTombstoneWithHttpInfo(WorkspaceApi.kt:3140) ~[io.airbyte-airbyte-api-0.62.1.jar:?] at io.airbyte.api.client.generated.WorkspaceApi.getWorkspaceByConnectionIdWithTombstone(WorkspaceApi.kt:509) ~[io.airbyte-airbyte-api-0.62.1.jar:?] at io.airbyte.workers.temporal.scheduling.activities.ConfigFetchActivityImpl.isWorkspaceTombstone(ConfigFetchActivityImpl.java:248) ~[io.airbyte-airbyte-workers-0.62.1.jar:?] at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[?:?] at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?] at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.22.3.jar:?] at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.22.3.jar:?] at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) ~[temporal-sdk-1.22.3.jar:?] at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) ~[temporal-sdk-1.22.3.jar:?] at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) ~[temporal-sdk-1.22.3.jar:?] at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) ~[temporal-sdk-1.22.3.jar:?] at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) ~[temporal-sdk-1.22.3.jar:?] at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) ~[temporal-sdk-1.22.3.jar:?] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?] at java.base/java.lang.Thread.run(Thread.java:1583) ~[?:?]

I upgraded from 60.1 to 61.0, got this error, and also with the lastest 62.x versions. I cannot sync any connection.

JuanPabloToniolo-Udesa avatar Jun 06 '24 16:06 JuanPabloToniolo-Udesa

I am using docker compose to deploy and am getting the same error

capture-namang avatar Jun 10 '24 06:06 capture-namang

Downgrading to 0.60.1 worked for now.

M-Dahab avatar Jun 10 '24 07:06 M-Dahab

downgrading is giving me a different error :

2024-06-10 15:29:09 Caused by: org.postgresql.util.PSQLException: ERROR: column actor_definition.support_refreshes does not exist
2024-06-10 15:29:09   Position: 618
2024-06-10 15:29:09     at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2725) ~[postgresql-42.7.3.jar:42.7.3]
2024-06-10 15:29:09     at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2412) ~[postgresql-42.7.3.jar:42.7.3]
2024-06-10 15:29:09     at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:371) ~[postgresql-42.7.3.jar:42.7.3]
2024-06-10 15:29:09     at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:502) ~[postgresql-42.7.3.jar:42.7.3]
2024-06-10 15:29:09     at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:419) ~[postgresql-42.7.3.jar:42.7.3]
2024-06-10 15:29:09     at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:194) ~[postgresql-42.7.3.jar:42.7.3]
2024-06-10 15:29:09     at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:180) ~[postgresql-42.7.3.jar:42.7.3]
2024-06-10 15:29:09     at com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44) ~[HikariCP-5.1.0.jar:?]
2024-06-10 15:29:09     at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java) ~[HikariCP-5.1.0.jar:?]
2024-06-10 15:29:09     at org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:219) ~[jooq-3.19.7.jar:?]
2024-06-10 15:29:09     at org.jooq.impl.Tools.executeStatementAndGetFirstResultSet(Tools.java:4940) ~[jooq-3.19.7.jar:?]
2024-06-10 15:29:09     at org.jooq.impl.AbstractResultQuery.execute(AbstractResultQuery.java:236) ~[jooq-3.19.7.jar:?]
2024-06-10 15:29:09     at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:348) ~[jooq-3.19.7.jar:?]

capture-namang avatar Jun 10 '24 10:06 capture-namang

Hey folks, for me the solution was quite straight-forward:

  1. Upgrading chart version "0.94.1" -> "0.143.0"
  2. Adding the global.airbyteUrl with a value of ingress host, no protocol/port specified.

AsoTora avatar Jun 10 '24 14:06 AsoTora

Hey folks, for me the solution was quite straight-forward:

  1. Upgrading chart version "0.94.1" -> "0.143.0"

  2. Adding the global.airbyteUrl with a value of ingress host, no protocol/port specified.

Hi how can we set up the 2nd setting? Thanks in advance

andy1xx8 avatar Jun 10 '24 20:06 andy1xx8