pglogical
pglogical copied to clipboard
pglogical apply PID exited with exit code 1
Base Info
Both on source & target:
- RDS Postgres 11.15
- Db.m5.8xlarge
- PGLogical v2.4.0
Example -
When we try to move data from one shard to a new instance which also has several read replicas, we get pglogical error. In our understanding, the process to copy data of a space using pglogical always consists of two main phases:
- The initial data copy (state=sync_data). During this phase, the primary server will COPY all tables, one by one, at their current state at the state of the replication (initial checkpoint), to the target server.
- The replication phase (state=replicating) - the target server will replay all data the primary server has written since the initial checkpoint.
The issue we observe consistently happens just after the initial data copy, when it is pivoting to the replication phase. Shortly after the subscription state goes from sync_data to replicating, it goes to state “down” (which is usually indicative of a problem) and does not recover. 2022-11-09 11:54:45 UTC::@:[352]:LOG: background worker "pglogical apply 16421:2167242526" (PID 21218) exited with exit code 1
Could you provide a test case? Did you check for error messages before the provided log message? You need to provide more than just a subscription status. Start with:
SELECT * FROM pglogical.local_sync_status WHERE sync_status <> 'r';
Note that if the config has pglogical.conflict_log_level = ERROR
, then the apply process will exit after logging the conflict details. You should set the level to warning
or notice
.