airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

[destination-clickhouse] is not syncing data to the main tables but only creating the internal tables

Open bhaskar-pv opened this issue 1 year ago • 19 comments
trafficstars

Connector Name

destination-clickhouse

Connector Version

v1.0.0

What step the error happened?

During the sync

Relevant information

I am trying to fetch data from Jira to Clickhouse. Clickhouse created a database airbyte_internal but after that it didnt create tables into the database which i provided into the configuration. Also there is no error in the logs

Relevant log output

2024-03-15 17:24:11 platform > Cloud storage job log path: /workspace/9373761/0/logs.log
2024-03-15 17:24:14 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: CLAIM — (workloadId = e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync) — (dataplaneId = prod-dataplane-gcp-us-west3-0)
2024-03-15 17:24:39 INFO i.m.r.Micronaut(start):100 - Startup completed in 11409ms. Server Running: http://orchestrator-repl-job-9373761-attempt-0:9000
2024-03-15 17:24:46 replication-orchestrator > Writing async status INITIALIZING for KubePodInfo[namespace=jobs, name=orchestrator-repl-job-9373761-attempt-0, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:dev-7c09730061, pullPolicy=IfNotPresent]]...
2024-03-15 17:24:46 replication-orchestrator > sourceLauncherConfig is: io.airbyte.persistence.job.models.IntegrationLauncherConfig@5246453e[jobId=9373761,attemptId=0,connectionId=e4f2c611-28c5-411a-a8e2-f3007c434837,workspaceId=73b9a1d6-99e6-40c1-bbdd-60a479d677dd,dockerImage=airbyte/source-jira:1.1.0,normalizationDockerImage=<null>,supportsDbt=false,normalizationIntegrationType=<null>,protocolVersion=Version{version='0.2.0', major='0', minor='2', patch='0'},isCustomConnector=false,allowedHosts=io.airbyte.config.AllowedHosts@299f9a81[hosts=[team-odr5bnlmfc62.atlassian.net, *.datadoghq.com, *.datadoghq.eu, *.sentry.io],additionalProperties={}],additionalEnvironmentVariables=<null>,additionalLabels={connection_id=e4f2c611-28c5-411a-a8e2-f3007c434837, job_id=9373761, attempt_id=0, workspace_id=73b9a1d6-99e6-40c1-bbdd-60a479d677dd, airbyte=job-pod, mutex_key=e4f2c611-28c5-411a-a8e2-f3007c434837, workload_id=e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync, auto_id=98d3b3f7-b650-4668-948c-155989ca7cff},priority=<null>,additionalProperties={}]
2024-03-15 17:24:46 replication-orchestrator > Attempt 0 to get the source definition for feature flag checks
2024-03-15 17:24:47 replication-orchestrator > Attempt 0 to get the source definition
2024-03-15 17:24:47 replication-orchestrator > Concurrent stream read enabled? false
2024-03-15 17:24:47 replication-orchestrator > Setting up source...
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_MEMORY_LIMIT: '50Mi'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_MEMORY_REQUEST: '25Mi'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_MEMORY_LIMIT: '50Mi'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_MEMORY_REQUEST: '25Mi'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_MEMORY_LIMIT: '50Mi'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_MEMORY_REQUEST: '25Mi'
2024-03-15 17:24:47 replication-orchestrator > Setting up destination...
2024-03-15 17:24:47 replication-orchestrator > Setting up replication worker...
2024-03-15 17:24:48 replication-orchestrator > Running replication worker...
2024-03-15 17:24:48 replication-orchestrator > start sync worker. job id: 9373761 attempt id: 0
2024-03-15 17:24:48 replication-orchestrator > 
2024-03-15 17:24:48 replication-orchestrator > configured sync modes: {null.application_roles=full_refresh - overwrite}
2024-03-15 17:24:48 replication-orchestrator > ----- START REPLICATION -----
2024-03-15 17:24:48 replication-orchestrator > 
2024-03-15 17:24:48 replication-orchestrator > Running destination...
2024-03-15 17:24:48 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-03-15 17:24:48 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-03-15 17:24:48 replication-orchestrator > Attempting to start pod = destination-clickhouse-strict-encrypt-write-9373761-0-ksddk for airbyte/destination-clickhouse-strict-encrypt:1.0.0 with resources ConnectorResourceRequirements[main=io.airbyte.config.ResourceRequirements@440309c5[cpuRequest=0.2,cpuLimit=1,memoryRequest=1Gi,memoryLimit=2Gi,additionalProperties={}], heartbeat=io.airbyte.config.ResourceRequirements@50c442a5[cpuRequest=0.05,cpuLimit=0.2,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdErr=io.airbyte.config.ResourceRequirements@4eb313ed[cpuRequest=0.01,cpuLimit=0.5,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdIn=io.airbyte.config.ResourceRequirements@3fc92211[cpuRequest=0.1,cpuLimit=1,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdOut=io.airbyte.config.ResourceRequirements@63d8590c[cpuRequest=0.01,cpuLimit=0.5,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}]] and allowedHosts null
2024-03-15 17:24:48 replication-orchestrator > destination-clickhouse-strict-encrypt-write-9373761-0-ksddk stdoutLocalPort = 9877
2024-03-15 17:24:48 replication-orchestrator > destination-clickhouse-strict-encrypt-write-9373761-0-ksddk stderrLocalPort = 9878
2024-03-15 17:24:48 replication-orchestrator > Creating stdout socket server...
2024-03-15 17:24:48 replication-orchestrator > Creating stderr socket server...
2024-03-15 17:24:48 replication-orchestrator > Creating pod destination-clickhouse-strict-encrypt-write-9373761-0-ksddk...
2024-03-15 17:24:49 replication-orchestrator > Waiting for init container to be ready before copying files...
2024-03-15 17:24:50 replication-orchestrator > Init container ready..
2024-03-15 17:24:50 replication-orchestrator > Copying files...
2024-03-15 17:24:50 replication-orchestrator > Uploading file: destination_config.json
2024-03-15 17:24:50 replication-orchestrator > kubectl cp /tmp/662791f6-c92a-400a-99e1-caabc7d8702b/destination_config.json jobs/destination-clickhouse-strict-encrypt-write-9373761-0-ksddk:/config/destination_config.json -c init --retries=3
2024-03-15 17:24:50 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:51 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:51 replication-orchestrator > Uploading file: destination_catalog.json
2024-03-15 17:24:51 replication-orchestrator > kubectl cp /tmp/d6ccc111-c15f-4406-9d62-e948df633cb8/destination_catalog.json jobs/destination-clickhouse-strict-encrypt-write-9373761-0-ksddk:/config/destination_catalog.json -c init --retries=3
2024-03-15 17:24:51 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:51 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:51 replication-orchestrator > Uploading file: FINISHED_UPLOADING
2024-03-15 17:24:51 replication-orchestrator > kubectl cp /tmp/d9998e01-fd71-4e9d-9242-04ff91d01085/FINISHED_UPLOADING jobs/destination-clickhouse-strict-encrypt-write-9373761-0-ksddk:/config/FINISHED_UPLOADING -c init --retries=3
2024-03-15 17:24:51 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:51 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:51 replication-orchestrator > Waiting until pod is ready...
2024-03-15 17:24:52 replication-orchestrator > Setting stdout...
2024-03-15 17:24:52 replication-orchestrator > Setting stderr...
2024-03-15 17:24:53 replication-orchestrator > Reading pod IP...
2024-03-15 17:24:53 replication-orchestrator > Pod IP: 172.25.12.67
2024-03-15 17:24:53 replication-orchestrator > Creating stdin socket...
2024-03-15 17:24:53 replication-orchestrator > Writing messages to protocol version 0.2.0
2024-03-15 17:24:53 replication-orchestrator > Reading messages from protocol version 0.2.0
2024-03-15 17:24:53 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-03-15 17:24:53 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-03-15 17:24:53 replication-orchestrator > Attempting to start pod = source-jira-read-9373761-0-gtjhb for airbyte/source-jira:1.1.0 with resources ConnectorResourceRequirements[main=io.airbyte.config.ResourceRequirements@6d9f624[cpuRequest=0.2,cpuLimit=1,memoryRequest=1Gi,memoryLimit=2Gi,additionalProperties={}], heartbeat=io.airbyte.config.ResourceRequirements@50c442a5[cpuRequest=0.05,cpuLimit=0.2,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdErr=io.airbyte.config.ResourceRequirements@6ce7d6ab[cpuRequest=0.01,cpuLimit=0.5,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdIn=null, stdOut=io.airbyte.config.ResourceRequirements@49d658bf[cpuRequest=0.2,cpuLimit=1,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}]] and allowedHosts io.airbyte.config.AllowedHosts@299f9a81[hosts=[team-odr5bnlmfc62.atlassian.net, *.datadoghq.com, *.datadoghq.eu, *.sentry.io],additionalProperties={}]
2024-03-15 17:24:53 replication-orchestrator > source-jira-read-9373761-0-gtjhb stdoutLocalPort = 9879
2024-03-15 17:24:53 replication-orchestrator > source-jira-read-9373761-0-gtjhb stderrLocalPort = 9880
2024-03-15 17:24:53 replication-orchestrator > Creating stdout socket server...
2024-03-15 17:24:53 replication-orchestrator > Creating stderr socket server...
2024-03-15 17:24:53 replication-orchestrator > Creating pod source-jira-read-9373761-0-gtjhb...
2024-03-15 17:24:53 replication-orchestrator > Waiting for init container to be ready before copying files...
2024-03-15 17:24:54 replication-orchestrator > Init container ready..
2024-03-15 17:24:54 replication-orchestrator > Copying files...
2024-03-15 17:24:54 replication-orchestrator > Uploading file: input_state.json
2024-03-15 17:24:54 replication-orchestrator > kubectl cp /tmp/36967abf-2356-471f-8d36-4c4157ff323d/input_state.json jobs/source-jira-read-9373761-0-gtjhb:/config/input_state.json -c init --retries=3
2024-03-15 17:24:54 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:54 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:54 replication-orchestrator > Uploading file: source_config.json
2024-03-15 17:24:54 replication-orchestrator > kubectl cp /tmp/a2572251-05d6-4d3f-af65-822748ef696b/source_config.json jobs/source-jira-read-9373761-0-gtjhb:/config/source_config.json -c init --retries=3
2024-03-15 17:24:54 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:54 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:54 replication-orchestrator > Uploading file: source_catalog.json
2024-03-15 17:24:54 replication-orchestrator > kubectl cp /tmp/22b6db7a-b858-455d-b85b-086db2092657/source_catalog.json jobs/source-jira-read-9373761-0-gtjhb:/config/source_catalog.json -c init --retries=3
2024-03-15 17:24:54 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:55 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:55 replication-orchestrator > Uploading file: FINISHED_UPLOADING
2024-03-15 17:24:55 replication-orchestrator > kubectl cp /tmp/b007711c-84b4-498d-bccc-ab53065aeb49/FINISHED_UPLOADING jobs/source-jira-read-9373761-0-gtjhb:/config/FINISHED_UPLOADING -c init --retries=3
2024-03-15 17:24:55 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:55 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:55 replication-orchestrator > Waiting until pod is ready...
2024-03-15 17:24:55 replication-orchestrator > Setting stdout...
2024-03-15 17:24:55 replication-orchestrator > Setting stderr...
2024-03-15 17:24:56 replication-orchestrator > Reading pod IP...
2024-03-15 17:24:56 replication-orchestrator > Pod IP: 172.25.6.117
2024-03-15 17:24:56 replication-orchestrator > Using null stdin output stream...
2024-03-15 17:24:56 replication-orchestrator > Reading messages from protocol version 0.2.0
2024-03-15 17:24:56 replication-orchestrator > Writing async status RUNNING for KubePodInfo[namespace=jobs, name=orchestrator-repl-job-9373761-attempt-0, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:dev-7c09730061, pullPolicy=IfNotPresent]]...
2024-03-15 17:24:56 replication-orchestrator > Destination output thread started.
2024-03-15 17:24:56 replication-orchestrator > Replication thread started.
2024-03-15 17:24:56 replication-orchestrator > Starting source heartbeat check. Will check every 1 minutes.
2024-03-15 17:24:56 replication-orchestrator > Waiting for source and destination threads to complete.
2024-03-15 17:24:56 replication-orchestrator > Starting workload heartbeat
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,002`main`1`INFO`i.a.i.d.c.ClickhouseDestinationStrictEncrypt(main):34 - starting destination: class io.airbyte.integrations.destination.clickhouse.ClickhouseDestinationStrictEncrypt
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,377`main`1`INFO`i.a.c.i.b.IntegrationCliParser(parseOptions):126 - integration args: {catalog=destination_catalog.json, write=null, config=destination_config.json}
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,378`main`1`INFO`i.a.c.i.b.IntegrationRunner(runInternal):132 - Running integration: io.airbyte.integrations.destination.clickhouse.ClickhouseDestinationStrictEncrypt
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,379`main`1`INFO`i.a.c.i.b.IntegrationRunner(runInternal):133 - Command: WRITE
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,379`main`1`INFO`i.a.c.i.b.IntegrationRunner(runInternal):134 - Integration config: IntegrationConfig{command=WRITE, configPath='destination_config.json', catalogPath='destination_catalog.json', statePath='null'}
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,779`main`1`WARN`c.n.s.JsonMetaSchema(newValidator):278 - Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,783`main`1`WARN`c.n.s.JsonMetaSchema(newValidator):278 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,808`main`1`INFO`i.a.c.i.b.s.SshWrappedDestination(getSerializedMessageConsumer):113 - No SSH connection options found, using defaults
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,491`main`1`INFO`i.a.c.i.b.s.SshTunnel(getInstance):252 - Starting connection with method: NO_TUNNEL
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,686`main`1`INFO`c.z.h.HikariDataSource(<init>):79 - HikariPool-1 - Starting...
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,702`main`1`INFO`c.z.h.HikariDataSource(<init>):81 - HikariPool-1 - Start completed.
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,872`main`1`INFO`i.a.c.i.d.j.JdbcBufferedConsumerFactory(lambda$toWriteConfig$0):122 - Write config: WriteConfig{streamName=application_roles, namespace=airbyte_data, outputSchemaName=airbyte_internal, tmpTableName=_airbyte_tmp_qqw_airbyte_data_raw__stream_application_roles, outputTableName=airbyte_data_raw__stream_application_roles, syncMode=overwrite}
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,885`main`1`INFO`i.a.c.i.d.b.BufferManager(<init>):53 - Max 'memory' available for buffer allocation 296 MB
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,895`pool-3-thread-1`17`INFO`i.a.c.i.d.b.BufferManager(printQueueInfo):118 - [ASYNC QUEUE INFO] Global: max: 296.96 MB, allocated: 10 MB (10.0 MB), % used: 0.03367428551701215 | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.000000
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,897`main`1`INFO`i.a.c.i.d.FlushWorkers(start):95 - Start async buffer supervisor
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,899`main`1`INFO`i.a.c.i.d.AsyncStreamConsumer(start):138 - class io.airbyte.cdk.integrations.destination_async.AsyncStreamConsumer started.
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,899`main`1`INFO`i.a.i.b.d.t.NoOpTyperDeduperWithV1V2Migrations(prepareTables):59 - Ensuring schemas exist for prepareTables with V1V2 migrations
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,899`pool-5-thread-1`19`INFO`i.a.c.i.d.FlushWorkers(printWorkerInfo):143 - [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 0
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,971`main`1`WARN`i.a.i.b.d.t.NoOpTyperDeduperWithV1V2Migrations(prepareTables):79 - Could not prepare schemas or tables because this is not implemented for this destination, this should not be required for this destination to succeed
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,971`main`1`INFO`i.a.c.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):165 - Preparing raw tables in destination started for 1 streams
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,971`main`1`INFO`i.a.c.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):170 - Preparing raw table in destination started for stream application_roles. schema: airbyte_internal, table name: airbyte_data_raw__stream_application_roles
2024-03-15 17:24:57 source > Starting syncing SourceJira
2024-03-15 17:24:58 source > Marking stream application_roles as STARTED
2024-03-15 17:24:58 replication-orchestrator > Attempt 0 to stream status started null:application_roles
2024-03-15 17:24:58 source > Syncing stream: application_roles 
2024-03-15 17:24:58 source > Marking stream application_roles as RUNNING
2024-03-15 17:24:58 replication-orchestrator > Attempt 0 to update stream status running null:application_roles
2024-03-15 17:24:59 source > Read 2 records from application_roles stream
2024-03-15 17:24:59 source > Marking stream application_roles as STOPPED
2024-03-15 17:24:59 source > Finished syncing application_roles
2024-03-15 17:24:59 source > SourceJira runtimes:
Syncing stream application_roles 0:00:00.946772
2024-03-15 17:24:59 source > Finished syncing SourceJira
2024-03-15 17:24:59 replication-orchestrator > Source has no more messages, closing connection.
2024-03-15 17:24:59 replication-orchestrator > (pod: jobs / source-jira-read-9373761-0-gtjhb) - Closed all resources for pod
2024-03-15 17:24:59 replication-orchestrator > Total records read: 5 (2 KB)
2024-03-15 17:24:59 replication-orchestrator > Schema validation was performed to a max of 10 records with errors per stream.
2024-03-15 17:24:59 replication-orchestrator > One of source or destination thread complete. Waiting on the other.
2024-03-15 17:24:59 replication-orchestrator > thread status... heartbeat thread: false , replication thread: true
2024-03-15 17:24:59 replication-orchestrator > thread status... timeout thread: false , replication thread: true
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,072`main`1`INFO`i.a.c.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):183 - Preparing raw tables in destination completed.
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,098`main`1`INFO`i.a.c.i.d.FlushWorkers(close):188 - Closing flush workers -- waiting for all buffers to flush
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,101`main`1`INFO`i.a.c.i.d.FlushWorkers(close):213 - REMAINING_BUFFERS_INFO
2024-03-15 17:25:02 destination >   Namespace: airbyte_data Stream: application_roles -- remaining records: 2
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,101`main`1`INFO`i.a.c.i.d.FlushWorkers(close):214 - Waiting for all streams to flush.
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,903`pool-6-thread-1`18`INFO`i.a.c.i.d.DetectStreamToFlush(getNextStreamToFlush):122 - flushing: trigger info: airbyte_data - application_roles, time trigger: false , size trigger: true current threshold b: 0 bytes, queue size b: 2.71 KB, penalty b: 0 bytes, after penalty b: 2.71 KB
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,905`pool-4-thread-1`28`INFO`i.a.c.i.d.FlushWorkers(lambda$flush$1):149 - Flush Worker (a177a) -- Worker picked up work.
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,906`pool-4-thread-1`28`INFO`i.a.c.i.d.FlushWorkers(lambda$flush$1):151 - Flush Worker (a177a) -- Attempting to read from queue namespace: airbyte_data, stream: application_roles.
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,907`pool-4-thread-1`28`INFO`i.a.c.i.d.GlobalMemoryManager(free):88 - Freeing 10482981 bytes..
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,910`pool-4-thread-1`28`INFO`i.a.c.i.d.FlushWorkers(lambda$flush$1):164 - Flush Worker (a177a) -- Batch contains: 2 records, 2.71 KB bytes.
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,911`pool-4-thread-1`28`INFO`i.a.i.d.c.ClickhouseSqlOperations(insertRecordsInternal):71 - actual size of batch: 2
2024-03-15 17:25:03 destination > 2024-03-15T17:25:03,102`main`1`INFO`i.a.c.i.d.FlushWorkers(close):217 - Closing flush workers -- all buffers flushed
2024-03-15 17:25:03 destination > 2024-03-15T17:25:03,102`main`1`INFO`i.a.c.i.d.GlobalMemoryManager(free):88 - Freeing 0 bytes..
2024-03-15 17:25:03 destination > 2024-03-15T17:25:03,103`main`1`INFO`i.a.c.i.d.FlushWorkers(close):225 - Closing flush workers -- supervisor shut down
2024-03-15 17:25:03 destination > 2024-03-15T17:25:03,103`main`1`INFO`i.a.c.i.d.FlushWorkers(close):227 - Closing flush workers -- Starting worker pool shutdown..
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,457`pool-4-thread-1`28`INFO`i.a.c.i.d.GlobalMemoryManager(free):88 - Freeing 0 bytes..
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,457`pool-4-thread-1`28`INFO`i.a.c.i.d.GlobalMemoryManager(free):88 - Freeing 2779 bytes..
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,458`pool-4-thread-1`28`INFO`i.a.c.i.d.FlushWorkers(lambda$flush$1):173 - Flush Worker (a177a) -- Worker finished flushing. Current queue size: 0
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,458`main`1`INFO`i.a.c.i.d.FlushWorkers(close):232 - Closing flush workers  -- workers shut down
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,459`main`1`INFO`i.a.c.i.d.b.BufferManager(close):92 - Buffers cleared..
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,460`main`1`INFO`i.a.i.b.d.t.NoOpTyperDeduperWithV1V2Migrations(typeAndDedupe):96 - Skipping TypeAndDedupe final
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,461`main`1`INFO`i.a.i.b.d.t.NoOpTyperDeduperWithV1V2Migrations(commitFinalTables):101 - Skipping commitFinalTables final
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,461`main`1`INFO`i.a.i.b.d.t.NoOpTyperDeduperWithV1V2Migrations(cleanup):106 - Cleaning Up type-and-dedupe thread pool
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,461`main`1`INFO`i.a.c.i.d.AsyncStreamConsumer(close):219 - class io.airbyte.cdk.integrations.destination_async.AsyncStreamConsumer closed
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,464`main`1`INFO`i.a.c.i.b.IntegrationRunner(runInternal):231 - Completed integration: io.airbyte.integrations.destination.clickhouse.ClickhouseDestinationStrictEncrypt
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,464`main`1`INFO`i.a.i.d.c.ClickhouseDestinationStrictEncrypt(main):36 - completed destination: class io.airbyte.integrations.destination.clickhouse.ClickhouseDestinationStrictEncrypt
2024-03-15 17:25:05 replication-orchestrator > (pod: jobs / destination-clickhouse-strict-encrypt-write-9373761-0-ksddk) - Closed all resources for pod
2024-03-15 17:25:05 replication-orchestrator > Source and destination threads complete.
2024-03-15 17:25:05 replication-orchestrator > Attempt 0 to update stream status complete null:application_roles
2024-03-15 17:25:05 replication-orchestrator > thread status... timeout thread: false , replication thread: true
2024-03-15 17:25:05 replication-orchestrator > sync summary: {
  "status" : "completed",
  "recordsSynced" : 0,
  "bytesSynced" : 0,
  "startTime" : 1710523488095,
  "endTime" : 1710523505694,
  "totalStats" : {
    "bytesCommitted" : 2435,
    "bytesEmitted" : 2435,
    "destinationStateMessagesEmitted" : 0,
    "destinationWriteEndTime" : 1710523505587,
    "destinationWriteStartTime" : 1710523488107,
    "meanSecondsBeforeSourceStateMessageEmitted" : 0,
    "maxSecondsBeforeSourceStateMessageEmitted" : 0,
    "maxSecondsBetweenStateMessageEmittedandCommitted" : 0,
    "meanSecondsBetweenStateMessageEmittedandCommitted" : 0,
    "recordsEmitted" : 2,
    "recordsCommitted" : 2,
    "replicationEndTime" : 1710523505684,
    "replicationStartTime" : 1710523488095,
    "sourceReadEndTime" : 1710523499622,
    "sourceReadStartTime" : 1710523493377,
    "sourceStateMessagesEmitted" : 0
  },
  "streamStats" : [ {
    "streamName" : "application_roles",
    "stats" : {
      "bytesCommitted" : 2435,
      "bytesEmitted" : 2435,
      "recordsEmitted" : 2,
      "recordsCommitted" : 2
    }
  } ]
}
2024-03-15 17:25:05 replication-orchestrator > failures: [ ]
2024-03-15 17:25:05 replication-orchestrator > 
2024-03-15 17:25:05 replication-orchestrator > ----- END REPLICATION -----
2024-03-15 17:25:05 replication-orchestrator > 
2024-03-15 17:25:07 replication-orchestrator > Returning output...
2024-03-15 17:25:07 replication-orchestrator > Writing async status SUCCEEDED for KubePodInfo[namespace=jobs, name=orchestrator-repl-job-9373761-attempt-0, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:dev-7c09730061, pullPolicy=IfNotPresent]]...
2024-03-15 17:24:42 INFO c.l.l.LDSLF4J$ChannelImpl(log):73 - Enabling streaming API
2024-03-15 17:24:42 INFO c.l.l.LDSLF4J$ChannelImpl(log):94 - Waiting up to 5000 milliseconds for LaunchDarkly client to start...
2024-03-15 17:24:45 INFO i.a.m.l.MetricClientFactory(initializeDatadogMetricClient):124 - Initializing DatadogMetricClient
2024-03-15 17:24:45 INFO i.a.m.l.DogStatsDMetricClient(initialize):52 - Starting DogStatsD client..
2024-03-15 17:25:07 INFO i.a.a.SegmentAnalyticsClient(close):223 - Closing Segment analytics client...
2024-03-15 17:25:07 INFO i.a.a.BlockingShutdownAnalyticsPlugin(waitForFlush):278 - Waiting for Segment analytic client to flush enqueued messages...
2024-03-15 17:25:07 INFO i.a.a.BlockingShutdownAnalyticsPlugin(waitForFlush):290 - Segment analytic client flush complete.
2024-03-15 17:25:07 INFO i.a.a.SegmentAnalyticsClient(close):227 - Segment analytics client closed.  No new events will be accepted.
2024-03-15 17:24:11 platform > Executing worker wrapper. Airbyte version: dev-7c09730061-cloud
2024-03-15 17:24:11 platform > Attempt 0 to save workflow id for cancellation
2024-03-15 17:24:11 platform > Creating workload e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync
2024-03-15 17:24:14 platform > Unknown feature flag "workload.polling.interval"; returning default value
2024-03-15 17:24:14 platform > Workload e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync is pending
2024-03-15 17:24:14 INFO i.a.w.l.c.WorkloadApiClient(claim):69 - Claimed: true for e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync via API for prod-dataplane-gcp-us-west3-0
2024-03-15 17:24:14 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: CHECK_STATUS — (workloadId = e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync) — (dataplaneId = prod-dataplane-gcp-us-west3-0)
2024-03-15 17:24:14 INFO i.a.w.l.p.s.CheckStatusStage(applyStage):61 - No pod found running for workload e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync
2024-03-15 17:24:14 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: BUILD — (workloadId = e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync) — (dataplaneId = prod-dataplane-gcp-us-west3-0)
2024-03-15 17:24:14 INFO i.a.a.c.AirbyteApiClient(retryWithJitterThrows):297 - Attempt 0 to retrieve the connection
2024-03-15 17:24:14 INFO i.a.a.c.AirbyteApiClient(retryWithJitterThrows):297 - Attempt 0 to retrieve the state
2024-03-15 17:24:15 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: MUTEX — (workloadId = e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync) — (dataplaneId = prod-dataplane-gcp-us-west3-0)
2024-03-15 17:24:15 INFO i.a.w.l.p.s.EnforceMutexStage(applyStage):55 - Mutex key: e4f2c611-28c5-411a-a8e2-f3007c434837 specified for workload: e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync. Attempting to delete existing pods...
2024-03-15 17:24:15 INFO i.a.w.l.p.s.EnforceMutexStage(applyStage):67 - Mutex key: e4f2c611-28c5-411a-a8e2-f3007c434837 specified for workload: e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync found no existing pods. Continuing...
2024-03-15 17:24:15 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: LAUNCH — (workloadId = e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync) — (dataplaneId = prod-dataplane-gcp-us-west3-0)
2024-03-15 17:24:56 INFO i.a.w.l.c.WorkloadApiClient(updateStatusToLaunched):54 - Attempting to update workload: e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync to LAUNCHED.
2024-03-15 17:24:56 INFO i.a.w.l.p.h.SuccessHandler(accept):61 - Pipeline completed for workload: e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync.
2024-03-15 17:25:14 platform > Workload e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync has returned a terminal status of success.  Fetching output...
2024-03-15 17:25:14 platform > Replication output for workload e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync : io.airbyte.config.ReplicationOutput@574b4a81[replicationAttemptSummary=io.airbyte.config.ReplicationAttemptSummary@3607ee90[status=completed,recordsSynced=0,bytesSynced=0,startTime=1710523488095,endTime=1710523505694,totalStats=io.airbyte.config.SyncStats@6438fd37[bytesCommitted=2435,bytesEmitted=2435,destinationStateMessagesEmitted=0,destinationWriteEndTime=1710523505587,destinationWriteStartTime=1710523488107,estimatedBytes=<null>,estimatedRecords=<null>,meanSecondsBeforeSourceStateMessageEmitted=0,maxSecondsBeforeSourceStateMessageEmitted=0,maxSecondsBetweenStateMessageEmittedandCommitted=0,meanSecondsBetweenStateMessageEmittedandCommitted=0,recordsEmitted=2,recordsCommitted=2,replicationEndTime=1710523505684,replicationStartTime=1710523488095,sourceReadEndTime=1710523499622,sourceReadStartTime=1710523493377,sourceStateMessagesEmitted=0,additionalProperties={}],streamStats=[io.airbyte.config.StreamSyncStats@2dfa2ffe[streamName=application_roles,streamNamespace=<null>,stats=io.airbyte.config.SyncStats@20e87782[bytesCommitted=2435,bytesEmitted=2435,destinationStateMessagesEmitted=<null>,destinationWriteEndTime=<null>,destinationWriteStartTime=<null>,estimatedBytes=<null>,estimatedRecords=<null>,meanSecondsBeforeSourceStateMessageEmitted=<null>,maxSecondsBeforeSourceStateMessageEmitted=<null>,maxSecondsBetweenStateMessageEmittedandCommitted=<null>,meanSecondsBetweenStateMessageEmittedandCommitted=<null>,recordsEmitted=2,recordsCommitted=2,replicationEndTime=<null>,replicationStartTime=<null>,sourceReadEndTime=<null>,sourceReadStartTime=<null>,sourceStateMessagesEmitted=<null>,additionalProperties={}],wasBackfilled=<null>,additionalProperties={}]],performanceMetrics=<null>,additionalProperties={}],state=<null>,outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@29926e61[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@ae0ff21[stream=io.airbyte.protocol.models.AirbyteStream@7699b45c[name=application_roles,jsonSchema={"type":"object","$schema":"http://json-schema.org/draft-07/schema#","properties":{"key":{"type":"string","description":"The key of the application role."},"name":{"type":"string","description":"The display name of the application role."},"groups":{"type":"array","items":{"type":"string"},"description":"The groups associated with the application role.","uniqueItems":true},"defined":{"type":"boolean","description":"Deprecated."},"platform":{"type":"boolean","description":"Indicates if the application role belongs to Jira platform (`jira-core`)."},"userCount":{"type":"integer","description":"The number of users counting against your license."},"groupDetails":{"type":["null","array"],"items":{"type":["null","object"]},"description":"Group Details"},"defaultGroups":{"type":"array","items":{"type":"string"},"description":"The groups that are granted default access for this application role.","uniqueItems":true},"numberOfSeats":{"type":"integer","description":"The maximum count of users on your license."},"remainingSeats":{"type":"integer","description":"The count of users remaining on your license."},"hasUnlimitedSeats":{"type":"boolean"},"selectedByDefault":{"type":"boolean","description":"Determines whether this application role should be selected by default on user creation."},"defaultGroupsDetails":{"type":["null","array"],"items":{"type":["null","object"],"properties":{"name":{"type":["null","string"]},"self":{"type":["null","string"]},"groupId":{"type":["null","string"]}}}},"userCountDescription":{"type":"string","description":"The [type of users](https://confluence.atlassian.com/x/lRW3Ng) being counted against your license."}},"description":"Details of an application role.","additionalProperties":true},supportedSyncModes=[full_refresh],sourceDefinedCursor=<null>,defaultCursorField=[],sourceDefinedPrimaryKey=[[key]],namespace=<null>,additionalProperties={}],syncMode=full_refresh,cursorField=[],destinationSyncMode=overwrite,primaryKey=[[key]],additionalProperties={}]],additionalProperties={}],failures=[],additionalProperties={}]

Contribute

  • [ ] Yes, I want to contribute

bhaskar-pv avatar Mar 15 '24 18:03 bhaskar-pv

Same.

AlexisSerneels avatar Mar 25 '24 17:03 AlexisSerneels

This is happening to us as well the data is not being copied to main database.

abhishekgahlot2 avatar Apr 08 '24 17:04 abhishekgahlot2

I have faced the same issue. Airbyte isn't even showing sync failed. Is it connector bug?

Harshit-Zenskar avatar Apr 08 '24 19:04 Harshit-Zenskar

I believe this is due to their rollout of Destinations V2. They seem to be pushing people to external orchestration systems. So I don't think this is a bug.

Here are some discussions I dug up that seem relevant.

https://github.com/airbytehq/airbyte/discussions/35339 https://github.com/airbytehq/airbyte/discussions/34860

From what I can see they seem to be focusing on E and L and pushing people to other platforms for T.

anthonator avatar Apr 08 '24 20:04 anthonator

Maybe @jbfbell, @rileybrook or @cgardens could shed some light on this?

anthonator avatar Apr 08 '24 20:04 anthonator

@anthonator however i tested with postgres destination the tables were created correctly in airbyte_internal database and main database where sync was suppose to happen but in case of clickhouse only airbyte_internal database tables were filled with data. no tables or data was present in main db specified in clickhouse destination

abhishekgahlot2 avatar Apr 08 '24 22:04 abhishekgahlot2

@abhishekgahlot2 from my understanding each destination needs to implement normalization and the ClickHouse destination currently does not.

See https://github.com/airbytehq/airbyte/discussions/35339

anthonator avatar Apr 09 '24 15:04 anthonator

@anthonator sorry for the delayed reply here but yes as of 1.0.0 we removed what we referred to as "normalization" or the creation of typed tables from Clickhouse. As You pointed out this was a result of the dv2 work. Normalization in its previous state was unmaintainable for us as a team and we are removing that previous implementation from the platform completely. While rolling out Dv2 to various destinations, this proved to be a time consuming process and we made the decision to pivot towards improving the underlying shared libraries. To put it another way, we would love to enable ourselves or the community to easily add a new v2 destination, but we are not there yet. However, we are actively working on getting there. Unfortunately Clickhouse fell on the other side of the cut line here.

Our hope was that by still moving the raw data rather than removing the Clickhouse connector completely, you could still build dbt models or other solutions on top of these tables.

While I understand this is likely not the response you're hoping for, thank you for bringing this up and contributing to that linked github discussion. It definitely helps with the prioritzation of this work.

jbfbell avatar Apr 09 '24 19:04 jbfbell

Are there any tools that i can use to convert the raw data to final tables meanwhile the support is coming for clickhouse in future.

Probably way to use the models generated by clickhouse and transform to final data.

abhishekgahlot2 avatar Apr 10 '24 20:04 abhishekgahlot2

@abhishekgahlot2 they mention Airflow, Prefect and Dagster in https://github.com/airbytehq/airbyte/discussions/34860.

Also see https://airbyte.com/blog/integrating-airbyte-with-data-orchestrators-airflow-dagster-and-prefect

anthonator avatar Apr 10 '24 20:04 anthonator

Thank @anthonator gonna give it a try.

abhishekgahlot2 avatar Apr 10 '24 20:04 abhishekgahlot2

Are there any tools that i can use to convert the raw data to final tables meanwhile the support is coming for clickhouse in future.

@abhishekgahlot2 ClickHouse comes with excellent JSONExtract-functions to parse the data from the column _airbyte_data. You can use these function when you query the data or use them in dbt tranformations.

@jbfbell Is there some kind of timeline when we can expect the ClickHouse connector to work as expected again?

jesperbagge avatar Apr 16 '24 20:04 jesperbagge

@jesperbagge jsonextract sounds like a good idea though i believe it will requires copying the whole data again because it won't support incremental append i believe or deduplication.

abhishekgahlot2 avatar Apr 22 '24 12:04 abhishekgahlot2

@jbfbell Considering ClickHouse is virtually your only supported modern on-prem DB, I am surprised to see this connector isnt getting more attention. ClickHouse has seen broad adoption in the last couple of months everywhere we look.

MeisterLone avatar Jun 13 '24 23:06 MeisterLone

Oh same issue, ClickHouse is such a beast—it's disappointing to know normalization is not possible.

Our normalisation was straight forward and works with other DBs so seamlessly. JSONExtract as mentioned beats the purpose as we have quite a lot of tables and various sources too.

cc : Airbyte team @jbfbell

o1lab avatar Jul 02 '24 11:07 o1lab

JSONExtract as mentioned beats the purpose as we have quite a lot of tables and various sources too.

@o1lab Yeah, I came to that conclusion myself in the end for the same reasons. I downgraded to version 0.2.5 to at least have structured data.

Also, I'm a big fan of NocoDB!

jesperbagge avatar Jul 02 '24 13:07 jesperbagge

They should have atleast update the AB Cloud documentation for ClickHouse, at minimum. It is in a broken state as is

MeisterLone avatar Jul 02 '24 20:07 MeisterLone

I am in no position to contribute currently, but I'll share another insight here for when clickhouse gets some attention- currently even the internal tables do not append properly. I have had a test connection between stripe and clickhouse set up for several days now as well as the same connection with same schema set up between stripe and redshift. It seems that after sync'ing every 5 minutes for 5 days, the clickhouse internal raw tables plainly are missing some updates where the redshift matches stripe dashboard records perfectly. So just using JSONExtract functions on the clickhouse internal tables airbyte generates, is not going to be accurate.

MeisterLone avatar Jul 03 '24 18:07 MeisterLone

I don't think it is good idea to not flatten the payload, it is a big limitation I believe

phdmohamedali avatar Oct 15 '24 05:10 phdmohamedali

Hi. @jbfbell Are there any news regarding clickhouse-destination connector? Maybe some roadmap, when it will be ready? Or maybe it is not in plans anymore and current state of connector is the final one? Would highly appreciate your answer. Thank you.

inkerinmaa avatar Dec 02 '24 17:12 inkerinmaa

Deferring to @evantahler

jbfbell avatar Dec 02 '24 17:12 jbfbell

Tagging @davinchia!

evantahler avatar Dec 02 '24 18:12 evantahler

guys do not rely on airbyte I did it I figured out a way to normalize it but it is using dbt so instead of ETL you guys will have to rely on ELT I wrote an article on this tutorial u can try it https://medium.com/@aryanrot234/airbyte-raw-data-transformation-using-dbt-dynamically-221de752733b

Aryanshaw avatar Dec 03 '24 06:12 Aryanshaw

Hi everyone, thank you for your patience as we work on updating Clickhouse. We are progressing along the plan Joe previously laid out - and we are finally at a point where we can consider refreshing and enhancing the current Clickhouse Destination. We are currently projecting work to begin within the next month, and to take us 4 - 8 weeks to implement and certify if all goes according to plan. I'll definitely keep everyone updated on our progress!

davinchia avatar Dec 03 '24 06:12 davinchia

Exceptional news. We will definitely reopen our account and try the new connector once it hits cloud. Excited to migrate off Fivetran.

MeisterLone avatar Dec 03 '24 06:12 MeisterLone

@davinchia any update on this?

MeisterLone avatar Feb 01 '25 15:02 MeisterLone

Until and unless there is a solution u can use this way that I created I wrote an article on this tutorial u can try it https://medium.com/@aryanrot234/airbyte-raw-data-transformation-using-dbt-dynamically-221de752733b

Aryanshaw avatar Feb 01 '25 20:02 Aryanshaw

dear god, facing the same issue. Airbyte is incredible, clickhouse is also incredible, but I have been hitting some very small blocks every once and a while, would love to see this finished as it is just not usable in the current state :(

joaomiles avatar Feb 07 '25 18:02 joaomiles

Ah shoot!! I have been building up a solution for days now in which I use Airbyte and an on-prem clickhouse instance, finally getting clickhouse to work, only to figure out why I am not seeing any destination tables.

Hoping this is solved soon 👍

justin-autobinck avatar Feb 27 '25 13:02 justin-autobinck

Hi @davinchia, any update on if airbyte is working on support for clickhouse destination.

abhishekgahlot2 avatar Mar 19 '25 12:03 abhishekgahlot2