pglogical icon indicating copy to clipboard operation
pglogical copied to clipboard

Tables are not synced even when their "sync_status" are "r"

Open beyondbill opened this issue 5 years ago • 4 comments

I managed to setup pglogical 2.2.0 logical replication from a Postgres 9.6.10 RDS instance to a Postgres 10.4 RDS instance. The subscription check SELECT subscription_name, status FROM pglogical.show_subscription_status(); returns replicating. The local sync status check SELECT * FROM pglogical.local_sync_status returns sync_status r for some tables and y for the rest.

However I find tables are not really synced even for those have been reported as r. For example, pglogical reports table trip_match_blocks has been fully synced (e.g. sync_status r) from upstream to downstream. But the number of rows is 84679 on the downstream while upstream has 84893 rows. Both row numbers were counted by SELECT count(*) FROM trip_match_blocks so they are accurate count of all valid rows. There's also nothing than pglogical that can write to both upstream and downstream. So their row numbers should be identical if they are fully synced.

I have not been able to find any official source that documents how to determine a table has been fully synced or the exact meanings of different sync_status in local_sync_status table. My guess about what they mean is all based on the following source code

#define SYNC_STATUS_NONE		'\0'	/* No sync. */
#define SYNC_STATUS_INIT		'i'		/* Ask for sync. */
#define SYNC_STATUS_STRUCTURE	's'     /* Sync structure */
#define SYNC_STATUS_DATA		'd'		/* Data sync. */
#define SYNC_STATUS_CONSTAINTS	'c'		/* Constraint sync (post-data structure). */
#define SYNC_STATUS_SYNCWAIT	'w'		/* Table sync is waiting to get OK from main thread. */
#define SYNC_STATUS_CATCHUP		'u'		/* Catching up. */
#define SYNC_STATUS_SYNCDONE	'y'		/* Synchronization finished (at lsn). */
#define SYNC_STATUS_READY		'r'		/* Done. */

https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_sync.h#L43-L51

Could be my misunderstanding of sync_status r. If so, how can I tell a table has been fully synced?

beyondbill avatar Jan 28 '19 21:01 beyondbill

Add more information that may be helpful: I've added 83 tables to the replication set. They are about 20GB in total size. The starting differences between upstream and downstream DBs are a few million rows in these tables.

beyondbill avatar Jan 29 '19 17:01 beyondbill

@beyondbill, Did you ever get an answer on this? I ask because the latest PGLogical 2.2.1 has a number of fixes in it that seem like they could be related (see https://www.2ndquadrant.com/en/resources/pglogical/release-notes/):

Rewrite worker signalling to address possible loss of messages when multiple signals are delivered, causing issues with sync and apply Fix use of the same restore-point name by pglogical_create_subscriber that could cause the wrong restore-point to be stopped at and an incomplete initial sync

diranged avatar Apr 25 '19 13:04 diranged

@beyondbill Is this issue resolved? I think it might have to do with syncronize_structure and syncronize_data. If not, I could help you.

shivansh931 avatar Oct 18 '19 06:10 shivansh931

I also have an issue with sync in pglogical. At the subscriber side in the table local_sync_status the records from the replication_set are copied with sync_status "i" and in the statistics of the subscription-table I see Sequential Tuples Read is 12752 from 15 records in the replication-set. (I restarted the instance several times). But nothing has changed in the tables at the subsciberside.

jangithub avatar Nov 05 '20 08:11 jangithub