pglogical icon indicating copy to clipboard operation
pglogical copied to clipboard

Replication slot on provider disappears

Open joshuabaird opened this issue 2 years ago • 6 comments

I had a PG9 provider and a PG13 subscriber where the initial data sync was running (for 4days). I noticed pglogical.show_subscription_status on the subscriber transitioned to down and that the replication slot on the provider was gone. The provider began logging this:

2021-08-17 01:24:04 UTC:10.178.81.137(23030):replication@shipment_feedback_service:[6139]:ERROR: replication slot "pgl_shipment37dc60c_provider_thesubscription" does not exist

No relevant logs on the subscriber that I could find.

Typically, dropping the subscription and re-creating it on the subscriber re-creates the replication slot on the provider. This is not happening though, and I don't see any logs that describe why.

What would cause the replication slot on the provider to be deleted and why isn't re-creating the subscription recreating it?

joshuabaird avatar Aug 17 '21 15:08 joshuabaird

Replication slots don't just disappear. So I would look into that. Maybe check if there is some other software running that is trying to manage replication or backups or something related to that that might have its own opinions what replication slots should be there.

petere avatar Aug 18 '21 12:08 petere

Hi @petere - thanks for responding! No other software running, although this is Amazon RDS. But, this is the first time I have ever seen this happen, and also the first time that I have seen create_subscription not re-create the replication slot on the provider.

This initial data load on this table is VERY slow (it's only ~70GB) for some reason as well. I'm considering re-creating the subscription using data_synchronize=false and then trying to use alter_subscription_resynchronize_table to sync it. From what I understand, this may yield better (faster) results.

If not, are there any options to restore the table from a pg_dump and then refresh/re-start sync after that?

joshuabaird avatar Aug 18 '21 13:08 joshuabaird

No. But if your database is big (several terabytes), you should consider using pglogical_create_subscriber that creates a logical instance from a base backup. It is faster than a logical clone. It seems it is not an option for you since you're using RDS. I's been some time since I checked RDS interface but maybe they already provide an option to create a logical replica using pglogical_create_subscriber.

eulerto avatar Aug 22 '21 13:08 eulerto

That will be quite tricky @eulerto given they are on RDS (@joshuabaird didn't mention if provider or subscriber, or both were on RDS). My 2 cents here are that we don't know what's running in RDS, as that's software deployed by Amazon, the same way as it's hard to know if there is something dropping slots there.

martinmarques avatar Aug 22 '21 18:08 martinmarques

Thanks @eulerto and @martinmarques. Correct, we can't use the basebackup option, so I guess we're stuck with having pglogical sync the data for us either using data_synchonize=true or data_synchronize=false and manually syncing the tables.

joshuabaird avatar Aug 23 '21 14:08 joshuabaird

@eulerto Any update on this issue. I am facing a similar issue in Azure.

sreejithvelath avatar Jun 03 '22 04:06 sreejithvelath