containers icon indicating copy to clipboard operation
containers copied to clipboard

Synchronous replication support is very limited

Open mouchar opened this issue 2 years ago • 15 comments

Name and Version

bitnami/postgresql-repmgr:11.15.0-debian-10-r65

What steps will reproduce the bug?

  1. :information_source: Deployed using bitnami/postgresql-ha helm chart, version 8.6.13
  2. :information_source: in the config, nothing special, the most important is postgresql.syncReplication=true
  3. :heavy_check_mark: Fresh installation works, synch. replication is set up on primary (so far so good).
  4. :heavy_check_mark: On the primary, postgresql.conf contains:
synchronous_commit = 'on'
synchronous_standby_names = '2 ("postgresql-ha-postgresql-0","postgresql-ha-postgresql-1","postgresql-ha-postgresql-2")'

Also, pg_stat_replication shows sync for both deployed replicas. 6. :heavy_check_mark: Delete primary pod. A new primary is elected, and the remaining standby now follows the newly promoted primary. 7. :red_circle: Synchronous replication is gone. Freshly promoted primary (former replica) is not aware of synchronous replication:

#synchronous_commit = on		# synchronization level;
#synchronous_standby_names = ''	# standby servers that provide sync rep
  1. :information_source: When the deleted pod is recreated, it joins the cluster as a new replica.
  2. :red_circle: There's no remaining trace that sync replication was ever turned on.

What is the expected behavior?

  • postgresql_configure_synchronous_replication must run also on replicas so the configuration is prepared in case of promotion to primary.
  • allow configuring synchronous replication even if the data directory exists (I mean when POSTGRESQL_FIRST_BOOT=no).

What do you see instead?

As described above - when a new primary is elected, replication is set to async. When the original primary pod is recreated, replication is set to async and can not be changed back to synchronous.

Additional information

One more thing that comes to my mind: the current configuration is very strict - the loss of a single replica makes all ongoing transactions hang and wait until all replicas are back online. This behavior may not be desired in rapidly changing environments like Kubernetes - pods may fail, may be evicted, and so on. It is possible to relax the synchronous_standby_names value by setting a lower number of replicas that need to confirm transaction, but helm chart hard-codes this value to .Values.postgresql.replicaCount - 1. Also, it's impossible to choose between FIRST and ANY.

Reference: postresql.org

mouchar avatar May 08 '22 13:05 mouchar

Hi @mouchar , Would you like to send a PR changing the configuration to reflect what you described?

miguelaeh avatar May 11 '22 09:05 miguelaeh

I can try to send PR if I manage to comprehend the whole flow of these shell scripts. Are there any existing test suites that I could use to be sure I didn't break anything else?

mouchar avatar May 11 '22 15:05 mouchar

Into the Helm Chart repository, once the PR is created, we can add the verify label to check that. When developing, you will need to perform some manual test. After the PR is merged our pipelines test properly several cases and usages.

miguelaeh avatar May 12 '22 14:05 miguelaeh

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] avatar May 28 '22 01:05 github-actions[bot]

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

github-actions[bot] avatar Jun 03 '22 01:06 github-actions[bot]

I would just add to comment that this is a very problematic scenario that leads to data loss in production

donburgess avatar Jul 13 '22 18:07 donburgess

Hi @donburgess , Would you elaborate a bit more ?

rafariossaa avatar Jul 18 '22 09:07 rafariossaa

Hi @donburgess , Would you elaborate a bit more ?

Yes, currently observed if a cluster is configured for synchronous mode and the master were to fail then during the fail over the cluster will permanently operate in an asynchronous mode without intervention. Depending on the lag time in the set this can lead to unmitigated loss of data during the next fail over. From an administrator point of view this could be considered catastrophic if the belief that the data was currently better protected by synchronous mode operations.

donburgess avatar Jul 18 '22 14:07 donburgess

Do you have any suggestion on how to improve this ?

rafariossaa avatar Jul 19 '22 07:07 rafariossaa

I don't know the project low level enough to give very specific guidance. I suspect two areas that could be the main problem. One is that when a replica takes over it has no configuration incentive to list other nodes as synchronous streams as it has a blank configuration. While I have not tested this scenario I suspect another cause could be that adding nodes to the cluster could be done as async replicas.

donburgess avatar Jul 21 '22 17:07 donburgess

We are going to transfer this issue to bitnami/containers

In order to unify the approaches followed in Bitnami containers and Bitnami charts, we are moving some issues in bitnami/bitnami-docker-<container> repositories to bitnami/containers.

Please follow bitnami/containers to keep you updated about the latest bitnami images.

More information here: https://blog.bitnami.com/2022/07/new-source-of-truth-bitnami-containers.html

carrodher avatar Jul 28 '22 13:07 carrodher

Hi, Sorry for the delay. I guess this is more a postgresql "issue" or an "issue" that could appear in postgresql in a container scenario, more than something that could be fixed in this container image. Maybe reporting this in a posgresql forum could bring some advice if something could be done in the image.

rafariossaa avatar Jul 29 '22 07:07 rafariossaa

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] avatar Aug 14 '22 01:08 github-actions[bot]

We have the same issue, any information of roadmap or workarounds?

naitmare01 avatar Aug 15 '22 09:08 naitmare01

I am creating an internal task to relax the settings mentioned in the the Additional Information in the issue. We will come back as soon as we have news.

rafariossaa avatar Aug 18 '22 09:08 rafariossaa

Any updates on this thread?

TroyKomodo avatar Mar 25 '23 20:03 TroyKomodo

Hi,

Just a quick note to let you know that the latest version of our Helm chart for postgresql-ha includes a new postgresql.syncReplicationMode setting to configure the synchronous replication mode. You can get more information about this both in the README file and the values.yaml.

Hope it helps!

gongomgra avatar May 18 '23 07:05 gongomgra

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] avatar Jun 03 '23 01:06 github-actions[bot]

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

github-actions[bot] avatar Jun 08 '23 01:06 github-actions[bot]