voyager HPA pod scaling causes 503

Hi,

We are using version 7.4.0.

It looks like 503s are being generated because the ingress configuration is not updated quickly enough because HPA is causing pods to scale up and scale down fairly rapidly.

We have turned off HPA.

Is there anything that can be done about the ingress configuration update to prevent the 503s.

Thanks Riad

May 07 '19 20:05 rmohammed-xtime

You sure it's related to HPA? We're getting this even when not using it. For us Voyager never seems to sync its table of pods and sends requests to terminated ones. #1334

May 15 '19 15:05 mkozjak

We're seeing 503 errors as well, but only during rolling restarts of our deployments. This is the RollingUpdate configuration in one of our deployments:

  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1

This means there must always be at least one healthy instance of our app available. However, during the rolling update there is a very short period of time that a 503 Service Unavailable error is returned.

This seems to be related to the way the HAProxy reloader works: it seems it can send requests to killed pods for a short amount of time, until it has reloaded its configuration.

@tamalsaha is this a known issue/limitation with how Voyager reloads HAProxy? Or am I doing it wrong? 🙂

May 15 '19 17:05 marceldegraaf

I should have titled the defect better.

HPA was the cause of a rapid up/down change in pods that lead to the 503s.

It looks like the 503s are possible with anything that changes pods

@tamalsaha Is this still an issue with Voyager 10

May 15 '19 18:05 rmohammed-xtime

Anyone tried with Voyager 10?

May 20 '19 09:05 mkozjak

@rmohammed-xtime Did you happen to notice whether this problem occurs in case of only scaling down or both up/down? Our guess is this should happen when pods are terminating. (Requests keep going to terminated pods - for a brief period of time)

This issue is different from #1334 btw.

May 29 '19 04:05 kfoozminus

@kfoozminus Both up/down

We observed peak 503s at 5:52am, 5.56am, 7:18am and below is data for number of pods around that time frame.

Data for number of pods (fluctuation in number of pods is due to HPA)

Data from 5:45 AM 5:45 – 5:50 AM 17 pods. 2 new at 5.48 AM 5:50-5:55 AM 19 pods 2 new at 5:52 5:55 – 6:00 AM 24 pods 5 new at 5:59:30 AM 6:00 – 6:05 AM 28 pods 4 new at 6:03 AM 6:05 – 6:10 AM 30 pods 2 new at 6:07 AM 6:10 – 6:15 AM 30 pods 4 down at 6:14 AM 6:15 – 6:20 AM 30 pods 4 down at 6:14 AM 6:20 – 6:25 AM 25 Pods 2 down at 6:15 AM 6:25 – 6:30 AM 21 Pods 2 additional at 6:26 AM 6:30 – 6:35 AM 24 pods 3 additions at 6:31 6:35 – 6:40 AM 30 Pods 6 additions at 6:36 6:40 – 6:45 AM 30 Pods 2 down 6:44 AM 6:45 – 6:50 AM 27 Pods 1 down at 6:45 AM 6:50 – 6:55 AM 26 Pods 3 down at 6:50 AM 6:55 – 7:00 AM 20 Pods 7:05 – 7:10 AM 22 Pods 2 additions at 7:09 AM 7:10 – 7:15 AM 29 Pods 7 additions at 7:14 AM 7:15 -7:20 AM 29 Pods 7:20 – 7:25 AM 30 Pods 1 addition at 7:22 AM 7:25 – 7:30 AM 30 Pods 7:30 – 7:35 AM 30 Pods 4 down at 7:31 AM

May 29 '19 21:05 rmohammed-xtime

@rmohammed-xtime are you using hpa for the ingress-pod too (Or, maybe using multiple ingress-pod)?

May 30 '19 09:05 kfoozminus

@kfoozminus The pods listed in my previous comment https://github.com/appscode/voyager/issues/1389#issuecomment-497111914 are from the same deployment. HPA was only enabled for pods from that deployment and nothing else.

There is only a single ingress deployed. There are 5 voyager pods that get created.

May 30 '19 17:05 rmohammed-xtime

voyager voyager copied to clipboard

HPA pod scaling causes 503

voyager
voyager copied to clipboard