voyager
voyager copied to clipboard
HPA pod scaling causes 503
Hi,
We are using version 7.4.0.
It looks like 503s are being generated because the ingress configuration is not updated quickly enough because HPA is causing pods to scale up and scale down fairly rapidly.
We have turned off HPA.
Is there anything that can be done about the ingress configuration update to prevent the 503s.
Thanks Riad
You sure it's related to HPA? We're getting this even when not using it. For us Voyager never seems to sync its table of pods and sends requests to terminated ones. #1334
We're seeing 503 errors as well, but only during rolling restarts of our deployments. This is the RollingUpdate
configuration in one of our deployments:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
This means there must always be at least one healthy instance of our app available. However, during the rolling update there is a very short period of time that a 503 Service Unavailable
error is returned.
This seems to be related to the way the HAProxy reloader works: it seems it can send requests to killed pods for a short amount of time, until it has reloaded its configuration.
@tamalsaha is this a known issue/limitation with how Voyager reloads HAProxy? Or am I doing it wrong? 🙂
I should have titled the defect better.
HPA was the cause of a rapid up/down change in pods that lead to the 503s.
It looks like the 503s are possible with anything that changes pods
@tamalsaha Is this still an issue with Voyager 10
Anyone tried with Voyager 10?
@rmohammed-xtime Did you happen to notice whether this problem occurs in case of only scaling down or both up/down? Our guess is this should happen when pods are terminating. (Requests keep going to terminated pods - for a brief period of time)
This issue is different from #1334 btw.
@kfoozminus Both up/down
We observed peak 503s at 5:52am, 5.56am, 7:18am and below is data for number of pods around that time frame.
Data for number of pods (fluctuation in number of pods is due to HPA)
Data from 5:45 AM 5:45 – 5:50 AM 17 pods. 2 new at 5.48 AM 5:50-5:55 AM 19 pods 2 new at 5:52 5:55 – 6:00 AM 24 pods 5 new at 5:59:30 AM 6:00 – 6:05 AM 28 pods 4 new at 6:03 AM 6:05 – 6:10 AM 30 pods 2 new at 6:07 AM 6:10 – 6:15 AM 30 pods 4 down at 6:14 AM 6:15 – 6:20 AM 30 pods 4 down at 6:14 AM 6:20 – 6:25 AM 25 Pods 2 down at 6:15 AM 6:25 – 6:30 AM 21 Pods 2 additional at 6:26 AM 6:30 – 6:35 AM 24 pods 3 additions at 6:31 6:35 – 6:40 AM 30 Pods 6 additions at 6:36 6:40 – 6:45 AM 30 Pods 2 down 6:44 AM 6:45 – 6:50 AM 27 Pods 1 down at 6:45 AM 6:50 – 6:55 AM 26 Pods 3 down at 6:50 AM 6:55 – 7:00 AM 20 Pods 7:05 – 7:10 AM 22 Pods 2 additions at 7:09 AM 7:10 – 7:15 AM 29 Pods 7 additions at 7:14 AM 7:15 -7:20 AM 29 Pods 7:20 – 7:25 AM 30 Pods 1 addition at 7:22 AM 7:25 – 7:30 AM 30 Pods 7:30 – 7:35 AM 30 Pods 4 down at 7:31 AM
@rmohammed-xtime are you using hpa for the ingress-pod too (Or, maybe using multiple ingress-pod)?
@kfoozminus The pods listed in my previous comment https://github.com/appscode/voyager/issues/1389#issuecomment-497111914 are from the same deployment. HPA was only enabled for pods from that deployment and nothing else.
There is only a single ingress deployed. There are 5 voyager pods that get created.