rabbitmq-server icon indicating copy to clipboard operation
rabbitmq-server copied to clipboard

Use "global hang workaround" for upgrades

Open ansd opened this issue 1 year ago • 0 comments

Follow up of https://github.com/rabbitmq/rabbitmq-server/pull/5442:

Even though we can set prevent_overlapping_partitions to true on master branch, we still need the "global hang workaround" for rolling upgrades from a version with prevent_overlapping_partitions set to false. Thanks @dumbbell for spotting this.

This is because:

Also note that this fix [prevent_overlapping_partitions] has to be enabled on all nodes in the network in order to work properly.

In RabbitMQ, we cannot hide this "feature" behind a feature flag because

  1. setting the parameter at runtime (or via advanced config) does not have any effect - it must be set at boot time, and
  2. we rely on this “feature” early at boot time before the feature flags are getting synced and enabled

I validated that prior to this PR, rolling upgrades from v3.11.x to master still get stuck on node boot hitting issue https://github.com/rabbitmq/rabbitmq-server/pull/5438.

I also validated that after this PR, rolling upgrades from v3.11.x to this branch do not get stuck, and we instead sometimes see the Global hang workaround debug logs (works in both scenarios: (i) both nodes on the new version, and (ii) one node on the new version and one node on the old version).

This PR should get merged only into master branch.

ansd avatar Aug 09 '22 10:08 ansd

Closing as no longer relevant because https://github.com/rabbitmq/rabbitmq-server/pull/5483 will revert https://github.com/rabbitmq/rabbitmq-server/pull/5442.

ansd avatar Aug 10 '22 17:08 ansd