pglogical icon indicating copy to clipboard operation
pglogical copied to clipboard

Sync is working with new data, but no existing data is coming over.

Open jarrettgreen opened this issue 4 years ago • 4 comments

Setup:

  • Provider: PG 9.4 with pglogical v2.2.2
  • Subscriber: PG 12 with pglogical 2.3.0

Expected Behavior: Existing data would make it's way over from the 9.4 db to the 12 db.

Actual Behavior: New inserts on the 9.4 db make it over but no existing data is transferred, or is being transfered as far as I can tell.

Instead of adding all tables, I dropped everything and only added a couple of smaller ones (1k rows rather than 1M) to test with. I can see updates trying to happen, and inserts do happen, but I cannot for the life of me get existing data transferred over.

jarrettgreen avatar Mar 12 '20 19:03 jarrettgreen

When trying to force sync using alter_subscription_resynchronize_table('subscription', 'table_name on the subscriber, I see the following in the provider's logs:

ERROR:  duplicate key value violates unique constraint "table_name_pkey"
DETAIL:  Key (id)=(89) already exists.
CONTEXT:  COPY table_name, line 1
STATEMENT:  COPY "public"."table_name" ("X","X","X,"...") FROM stdin

The subscriber's tables are empty and the provider doesn't have a dupe. I'm not sure whats going on here. I assume sequencing is off or something?

jarrettgreen avatar Mar 13 '20 21:03 jarrettgreen

Got the same issue, Publisher PG 9.4.26 (pglogical 2.3.1) -> Sub PG 12.3 (pglogical 2.3.1)

   <2020-06-03 07:03:17 UTC-[unknown]--sfdb [app:pglogical apply 16453:1559963640,pid:37201,23000]>LOG:  CONFLICT: remote UPDATE on relation schema_name_1.tab_name_1 replica identity index pk_tab_name_1 (tuple not found). Resolution: skip.
   <2020-06-03 07:03:17 UTC-[unknown]--sfdb [app:pglogical apply 16453:1559963640,pid:37201,23000]>DETAIL:  remote tuple {<data here>} in xact origin=1,timestamp=2020-06-03 07:03:17.563272+00,commit_lsn=0/B2D263B0
   <2020-06-03 07:03:17 UTC-[unknown]--sfdb [app:pglogical apply 16453:1559963640,pid:37201,23000]>CONTEXT:  apply UPDATE from remote relation schema_name_1.tab_name_1 in commit before 2DDF/B2D263B0, xid 39550428 committed at 2020-06-03 07:03:17.563272+00 (action #93) from node replorigin 1
   
   <2020-06-03 07:03:17 UTC-[unknown]--sfdb [app:pglogical apply 16453:1559963640,pid:37201,23505]>ERROR:  duplicate key value violates unique constraint "pk_tab_name_1"
   <2020-06-03 07:03:17 UTC-[unknown]--sfdb [app:pglogical apply 16453:1559963640,pid:37201,23505]>DETAIL:  Key (<fields here>)=(<data here>) already exists.
   <2020-06-03 07:03:17 UTC-[unknown]--sfdb [app:pglogical apply 16453:1559963640,pid:37201,23505]>CONTEXT:  apply multi INSERT from remote relation schema_name_1.tab_name_1 in commit before 2DDF/B2D263B0, xid 39550428 committed at 2020-06-03 07:03:17.563272+00 (action #415) from node replorigin 1
   <2020-06-03 07:03:17 UTC-[unknown]--sfdb [app:pglogical apply 16453:1559963640,pid:37201,00000]>LOG:  apply worker [37201] at slot 2 generation 2 exiting with error

   <2020-06-03 07:03:17 UTC--- [app:,pid:27881,00000]>LOG:  background worker "pglogical apply 16453:1559963640" (PID 37201) exited with exit code 1

the error in turn cause another issue with PG 12 replica:

<2020-06-03 06:57:51 UTC--- [app:,pid:26390,00000]>LOG:  restartpoint starting: wal
<2020-06-03 07:03:17 UTC--- [app:,pid:26389,XX000]>PANIC:  invalid max offset number
<2020-06-03 07:03:17 UTC--- [app:,pid:26389,XX000]>CONTEXT:  WAL redo at 51/C3CE8B88 for Heap2/MULTI_INSERT: 6 tuples flags 0x08

Is it possible that PG 12 WAL was corrupted by pglogical ? Is it safe at all to use pglogical in Pub 9.4.26 -> Sub PG 12.3 model?

thamerlan avatar Jun 03 '20 10:06 thamerlan

Looks like related to: https://github.com/2ndQuadrant/pglogical/pull/295

bdrouvot avatar Feb 08 '21 06:02 bdrouvot

provider 10.4, pglogical 2.4.1 subscriber 10.2, pglogical 2.4.0

seeing the same behavior although #295 says it was fixed in 2.4.0?

srl295 avatar Jun 29 '22 19:06 srl295