airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

Source-postgres: Postgres logical replication errors out with invalid timestamp

Open ameyabhope opened this issue 3 years ago • 12 comments

Environment

  • Airbyte version: 0.39.19-alpha
  • OS Version / Instance: GCP n2.
  • Deployment: GCP Compute Engine
  • Source Connector and version: Postgres 0.4.24
  • Destination Connector and version: Bigquery 1.1.9
  • Step where error happened: Sync job

Current Behavior

Tell us what happens. Logical replication fails, with error message such as -

2022-06-20 16:24:17 normalization > 16:24:12.538432 [info ] [MainThread]: Completed with 1 error and 0 warnings:
2022-06-20 16:24:17 normalization > 16:24:12.539085 [info ] [MainThread]: 
2022-06-20 16:24:17 normalization > 16:24:12.539571 [error] [MainThread]: Database Error in model deposit_scd (models/generated/airbyte_incremental/scd/raw_checkmate/deposit_scd.sql)
2022-06-20 16:24:17 normalization > 16:24:12.540151 [error] [MainThread]:   Invalid timestamp string "0000-12-30T00:00:00Z"
2022-06-20 16:24:17 normalization > 16:24:12.540678 [error] [MainThread]:   compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_incremental/scd/raw_<db>/<table>_scd.sql

Checking the source table, there is no row with the timestamp 0000-12-30T00:00:00Z. Other tables within the same db, are replicated without any errors. This behavior is seen in multiple dbs we currently replicate (where one of the table fails with some kind of invalid timestamp issue, while that particular timestamp doesn't exist in the table). The Standard replication works as expected for these dbs.

Expected Behavior

Tell us what should happen. Successful logical replication for all tables in the db.

Logs

logs-81417.txt

Steps to Reproduce

  1. Set up logical replication
  2. Trigger sync job
  3. Observe error

Are you willing to submit a PR?

Yes, if pointed in the right direction.

ameyabhope avatar Jun 20 '22 17:06 ameyabhope

Hey @ameyabhope this error happens during the normalization. Could you please find and share the record with this failing timestamp in your raw table? It might also help if you could share your catalog (instructions here).

alafanechere avatar Jun 21 '22 12:06 alafanechere

If you don't mind I'd prefer to have this conversation on our forum to first try to identify the root cause of the problem and then open an issue with specific details about what should be fixed, feel free to post there.

alafanechere avatar Jun 21 '22 13:06 alafanechere

Thanks @alafanechere, opened this on the forum https://discuss.airbyte.io/t/postgres-to-bigquery-logical-replication-errors-out-with-invalid-timestamp/1532/2

ameyabhope avatar Jun 21 '22 18:06 ameyabhope

I'm closing this issue while we have an ongoing discussion on the forum.

alafanechere avatar Jun 22 '22 13:06 alafanechere

@ameyabhope I'm reopening the issue. Could you please make a final check on the raw data table. I wonder if your timestamp value is 0001-01-01T00:00:00Z or 0000-12-30T00:00:00Z in the destination raw table for your transactions_in stream.

I attach for reference details you provided on the forum:

  • The problematic column is typed as timestamp with time zone on the source
  • The entry for the problematic column in the source catalog:
                        "settlement_date": {
                            "type": "string",
                            "format": "date-time",
                            "airbyte_type": "timestamp_with_timezone"
                        },

alafanechere avatar Jul 15 '22 07:07 alafanechere

Thanks @alafanechere for reopening the issue.

Confirmed that the value is "settlement_date": "0001-01-01T00:00:00Z" on the destination with standard replication.

Happy to give more info :)

ameyabhope avatar Jul 15 '22 14:07 ameyabhope

+1 on this issue -

Airbyte version: 0.39.31-alpha OS Version / Instance: GCP n2. Deployment: GCP Compute Engine Source Connector and version: Postgres 0.4.35 (also occurred with 0.4.30) Destination Connector and version: Bigquery 1.1.11 Step where error happened: syncs (both incremental and overwrite) with and without normalization

Invalid timestamp: '+90220-01-02T01:30:00.000000'

garygcd avatar Jul 21 '22 16:07 garygcd

Some additional context:

I am able to manually run the dbt models on the raw table without encountering an error.

Prior to the timestamp err, the following warning is present as well- WARN i.a.i.b.IntegrationRunner(watchForOrphanThreads):219 - The main thread is exiting while children non-daemon threads from a connector are still active.

garygcd avatar Jul 26 '22 21:07 garygcd

@ameyabhope @garygcd thank you for all the information - if you have any more, please go ahead and post it. This has been triaged to the databases team, so they should be able to put this on their roadmap!

natalyjazzviolin avatar Aug 01 '22 16:08 natalyjazzviolin

I upgraded the airbyte version and the postgres source version and getting an invalid error before the normalization step now, here are the details -

Environment

  • Airbyte version: 0.40.0-alpha
  • OS Version / Instance: GCP n2.
  • Deployment: GCP Compute Engine
  • Source Connector and version: Postgres 1.0.3
  • Destination Connector and version: Bigquery 1.1.9
  • Step where error happened: Sync job

Here's the log file of the job - logs-9378.txt

Specifically this part -

2022-08-17 22:27:01 source > 2022-08-17 22:27:01 ERROR i.d.p.ErrorHandler(setProducerThrowable):35 - Producer failure
2022-08-17 22:27:01 source > io.debezium.DebeziumException: java.time.format.DateTimeParseException: Text '0' could not be parsed at index 0
2022-08-17 22:27:01 source > 	at io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:85) ~[debezium-core-1.9.2.Final.jar:1.9.2.Final]
2022-08-17 22:27:01 source > 	at io.debezium.pipeline.ChangeEventSourceCoordinator.doSnapshot(ChangeEventSourceCoordinator.java:155) ~[debezium-core-1.9.2.Final.jar:1.9.2.Final]
2022-08-17 22:27:01 source > 	at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:137) ~[debezium-core-1.9.2.Final.jar:1.9.2.Final]
2022-08-17 22:27:01 source > 	at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:109) ~[debezium-core-1.9.2.Final.jar:1.9.2.Final]
2022-08-17 22:27:01 source > 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
2022-08-17 22:27:01 source > 	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
2022-08-17 22:27:01 source > 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
2022-08-17 22:27:01 source > 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2022-08-17 22:27:01 source > 	at java.lang.Thread.run(Thread.java:833) [?:?]

Pls let me know if any other info is needed.

ameyabhope avatar Aug 18 '22 19:08 ameyabhope

Hey everyone, we have made multiple changes to the way timestamp values are handled in postgres connector in the last month. Can you try the sync with the latest version of postgres source.

Also for future reference, if you experience the same or different error, can you pleas share the schema of the table in the postgres database and the value for which the sync fails so that its easier for us to reproduce the issue on our end.

subodh1810 avatar Sep 13 '22 08:09 subodh1810

Moving to "blocked" until we get some more information here

grishick avatar Sep 13 '22 21:09 grishick

Closing due inactivity.

marcosmarxm avatar Nov 30 '22 20:11 marcosmarxm

Hey, i also have the same problem...

log

trantrinhquocviet avatar Aug 23 '23 03:08 trantrinhquocviet

Airbyte version: 0.5.06-alpha OS Version / Instance: GCP n2. Deployment: GCP Compute Engine Source Connector and version: Postgres v3.1.4 Destination Connector and version: BigQuery v1.7.5 Step where error happened: Sync job

trantrinhquocviet avatar Aug 23 '23 03:08 trantrinhquocviet