[bitnami/postgresql-repmgr] - Postgresql HA - solution for slip brain - multi master
Description of the change
This change incorporate the case for a failing primary Node with persistent volume attached to the Node. That create a problem . Because the primary cannot move to another node. So, it can occur that the replicas don't have a upstream available. So I incorporate the possibility for a standby node to self promote it self in case the Primary is more than REPMGR_WAIT_PRIMARY_TIME unavailable.
I also put random alterations to some waiting times to the standby replicas came at different point in time.
Benefits
The benefits:
- more resilient cluster in case of definite failure of a primary node
- more control over the amount of time we wait for the primary. In case the node is busy restarting. This prevent the famous split brain Reference: https://github.com/bitnami/charts/pull/9035
Possible drawbacks
- too much waiting time for a new primary.
Applicable issues
https://github.com/bitnami/charts/pull/9035
Additional information
Its important to apply this improvement with a Liveness probe with at least a bigger period for REPMGR_WAIT_PRIMARY_TIME , plus the time of the node setup.
Hi Team, Was it possible for you to review this merge ?
Hi, I wanted to know if the team was able to review this pull request ?
This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.
Hi Bitnami team. Where you able to review the proposed changes ?
Why don't you just add the possibility to configure/create nodes of the type witness? I thought witness nodes are just made for this purpose to prevent split brains (https://repmgr.org/docs/repmgr.html#REPMGR-CONCEPTS / https://repmgr.org/docs/repmgr.html#REPMGR-WITNESS-REGISTER)? Currently only primary and standby nodes are possible to configure.
@seriouz adding the witness is a major surgery to this development. However with the changes in this commit I was able to support a down datacenter without downtime. I am running the postgres cluster in production.
@migruiz4 I've already made the changes. Can you let me know if you need something else ?
This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.
Hi bitnami team, is it posible to go along with this merge?
This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.
Hi Bitnami team, any news about this request ?
Hi @martinfar, I'm sorry for the late response.
After some in-depth evaluation of this proposal, I'm afraid we can not merge these changes as they are because of the following reasons:
- We can not change the
reconnect_intervalto a random value as it is not deterministic (different deployments could behave differently). https://github.com/bitnami/containers/blob/beb5b0b050c0cb93ac756708339d99aea8bd1f21/bitnami/postgresql-repmgr/10/debian-11/rootfs/opt/bitnami/scripts/librepmgr.sh#L537-L539 - The same applies to the changes proposed on the
repmgr_wait_primary_nodefunction. https://github.com/bitnami/containers/blob/beb5b0b050c0cb93ac756708339d99aea8bd1f21/bitnami/postgresql-repmgr/10/debian-11/rootfs/opt/bitnami/scripts/librepmgr.sh#L611-L615 - The new REPMGR_PRIMARY_ROLE_LOCK_FILE_NAME could introduce additional corner cases, and its functionality is unclear.
I would like to share my insights on it.
Considering this is the container setup.sh:
What the proposed changes consist of:# Ensure PostgreSQL & repmgr environment variables settings are valid repmgr_validate postgresql_validate # Set the environment variables for the node's role eval "$(repmgr_set_role)" # Ensure PostgreSQL is stopped when this script ends. trap "postgresql_stop" EXIT # Ensure 'daemon' user exists when running as 'root' am_i_root && ensure_user_exists "$POSTGRESQL_DAEMON_USER" --group "$POSTGRESQL_DAEMON_GROUP" # Prepare PostgreSQL default configuration repmgr_postgresql_configuration # Prepare repmgr configuration repmgr_generate_repmgr_config # Initialize PostgreSQL & repmgr repmgr_initialize- The function
repmgr_wait_primary_nodehas been modified to retry the upstream connection to obtain a primary node. - At
eval "$(repmgr_set_role)", the container will try to connect to the upstream primary, and if a connection can't be established, then the node will start as primary if it is marked to, otherwise will start as standby. The proposed change is to run the modifiedrepmgr_wait_primary_node, and if it fails, create the 'promote standby' lock file. This is executed either if the node will be assigned the primary or standby role, as far as it wasn't a primary before. If no upstream nodes can be reached, the lock file will exist. - During the
repmgr_initialize, if the node has role 'primary' and it is not a first boot, the commandstandby promotewill be executed. I'm afraid this logic is too complicated and would cause the troubleshooting to be more complicated. Additionally, the objective of the promotion lock file is unclear, in what scenario would it be helpful?
- The function
With the current proposed changes, looks like the objective is to run the promote command if the following conditions are met:
- Node wasn't a primary node in the previous execution.
- Upstream couldn't be reached.
- Node is marked as initial primary during current execution.
But that raises two additional questions:
- Is a lock file really necessary? Do we need to persist it between executions?
- Considering we would be promoting Standby nodes if they are isolated from upstream, wouldn't that cause even more split-brain scenarios?
It would be helpful if you could provide more information about in which scenarios would this feature would be beneficial, otherwise, we may have to close this PR.
Hi @migruiz4 . Thanks for the feedback. I will analyse your suggestion
@martinfar If there is anything we could do to help you please let us know.
@martinfar If there is anything we could do to help you please let us know.
@migruiz4 In fact, if you can help me, I would like to know if have a better idea to remove the random time periods for a way where each will verify in a different time after the primary is unavailable. The idea is to avoid that two independent nodes decide to upgrade themselves to primary at the same. That will cause a split brain situation.
The idea is to avoid that two independent nodes decide to upgrade themselves to primary at the same. That will cause a split brain situation.
The best way to prevent split-brain is through quorum, not sure if this is the case, but I would always recommend deploying an odd number of nodes.
Normally, deploying an odd number of nodes (n), should be enough to prevent split-brain scenarios as (n / 2 + 1) nodes need to agree for a new primary to be promoted.
The witness node function is to force quorum in a 2-node deployment, where the witness node votes on the primary node election, but does not participate in the replication.
The witness node function is to force quorum in a 2-node deployment, where the witness node votes on the primary node election, but does not participate in the replication.
I really think that a witness node is the solution to all of this... where is this feature in the roadmap ? I mean in this bitnami project Postgres HA
The witness node function is to force quorum in a 2-node deployment, where the witness node votes on the primary node election, but does not participate in the replication.
I really think that a witness node is the solution to all of this... where is this feature in the roadmap ? I mean in this bitnami project Postgres HA
Are thinking of something like this? https://github.com/bitnami/containers/issues/6663
The witness node function is to force quorum in a 2-node deployment, where the witness node votes on the primary node election, but does not participate in the replication.
I really think that a witness node is the solution to all of this... where is this feature in the roadmap ? I mean in this bitnami project Postgres HA
Are thinking of something like this? #6663
Yes the idea is to implement a witness as you suggested in that issue
This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Pull Request. Do not hesitate to reopen it later if necessary.