kubernetes-ingress icon indicating copy to clipboard operation
kubernetes-ingress copied to clipboard

High availability loss caused by simultaneous HAProxy configuration changes

Open zimnx opened this issue 2 years ago • 6 comments

When a HAProxy configuration change is required and there are active connections, the HAProxy waits until all connections are closed before initiating a restart with the hard-stop-after timeout. This behavior becomes problematic in environments where multiple HAProxy ingress controllers are watching the same Kubernetes resources, such as Services and Ingresses. When an update of these is observed, all instances of the HAProxy ingress controller detect the configuration changes simultaneously.

This leads to a situation where the hard-stop-after timeout is effectively triggered by each instance. Consequently, all active connections are closed simultaneously, causing significant availability issues.

Steps to reproduce:

  1. Set up an environment with multiple HAProxy ingress controller instances.
  2. While there are active connections, update one of the Ingresses in a way that it requires a HAProxy configuration change.

Expected behavior:

The HAProxy ingress controllers should coordinate the configuration rollout in a way that prevents simultaneous restarts by multiple instances. This coordination should ensure a smooth transition without abruptly closing all active connections at once.

Actual behavior:

The current behavior leads to simultaneous HAProxy restarts due to the detection of configuration changes by all HAProxy's at once. This results in the hard-stop-after timeout being reached by each instance, causing all active connections to be closed concurrently.

zimnx avatar Aug 31 '23 08:08 zimnx

Hi @zimnx , the controllers are all indepedent and have no means to coordinate. It's intended that they react simultaneously but anyway keep in mind that a transaction is actually a time window for change set to 5 secs s by default. I guess that all the instances are not really started at the same millisecond so there will always be a period where one has already started the new configuration while an other one has still the same configuration. Maybe you can tweak the sync-period parameter with different values for different controllers to minimize the risk you're talking about.

ivanmatmati avatar Aug 31 '23 08:08 ivanmatmati

@zimnx, since reloads are seamless, you could use different hard-stop-after for each instance ~or even disable it to prevent concurrent connection interruptions~ (bad advice).

fabianonunes avatar Sep 03 '23 15:09 fabianonunes

It seem like all the tips are meant at just making it less likely to happen but how can users make sure it's always HA? Should something coordinate the restarts? (live reloads should be fine).

the controllers are all indepedent and have no means to coordinate.

I assume they could sync using the kube api, that all of them are connected to. It kind of seems like the issue relates to doing a supervisor that restart the app within the container, while the other HA apps let Kubernetes rollout the change through a workload controller that respects HA and PodDisruptionBudgets so they don't have to do it on their own.

@zimnx, since reloads are seamless, you could use different hard-stop-after for each instance

@fabianonunes How would you do this in production? Given you had 1 Deployment with 5 replicas wouldn't that mean 5 Deployments with 1 replica to have separate configs?

or even disable it to prevent concurrent connection interruptions.

What happens if it is disabled? Will the old processes pile up and OOM at some point?

If you use hard-stop-after, with enough changes coming in frequently (Services or Ingresses being created / updated) can enough old processes pile up so the Pod gets OOM?

tnozicka avatar Sep 07 '23 11:09 tnozicka

Hi @zimnx , As @ivanmatmati said, the controllers are all indepedent and have no means to coordinate. To help and have it less likely to happen, we could add another annotations hard-stop-after-random with the following behavior:

  • if only hard-stop-after is set, no change
  • if hard-stop-after-random is set as well as hard-stop-after, when we generated the haproxy.cfg , we add a random time (between 0 and hard-stop-after-random to the hard-stop-after time.

That would not solve everything, but would help, and allow to have only 1 deployment with several replicas but different values.

How would you like this option ?

hdurand0710 avatar Sep 25 '23 09:09 hdurand0710

This would be a workaround, which brings undeterministic behavior with a chance that all nodes will restart at once even when this parameter is set. In a long run, chance of hitting it might be significant.

I think we could live with it temporarily, but issue still persists. Config shouldn't be rolled out simultaneously on all nodes.

zimnx avatar Sep 25 '23 19:09 zimnx

hi @zimnx

which brings undeterministic behavior

If I'm not mistaken deterministic behavior that we have with ingress is not good, but at the same time proposed non deterministic one is also not good. But there is no third option, either you are or you are not deterministic.

in general if you have 5 replicas then they are exactly that, replica, duplicate of each other that behaves and acts on the changes in same manner.

Should something coordinate the restarts

yes, but the answer is leading me to inform you about our different products on Enterprise side, specifically called Fusion that can solve that issue.

Following this conversation and proposed solution it seems that there is a room to improvement, but if we bring randomness to the table, more people might be confused, so I'm going to for now put this on hold, in future with adding potentially new features to HAProxy we might have different options even.

oktalz avatar Oct 11 '23 14:10 oktalz