contour
contour copied to clipboard
Progressive traffic increase for new Pods
We have a JVM-based web app behind Contour/Envoy/NLB, with horizontal pod auto scaling in place. When a new pod gets created due to auto scaling, Contour/Envoy directs a proportional amount of traffic on that new pod. However, because the app is cold, we're seeing consistent timeouts until it warms up.
We tried the same scenario by using a Service type LoadBalancer, in EKS (with an Elastic Load Balancer in front) and we don't see the same issue in this scenario. This seems to be because the ELB is doing a progressive traffic increase on the new pod, as the graph seen below.

Is there any plan to support something similar in Contour? I see we have the possibility to set weights for different services in an IngressRoute. Would it be something to consider to set some weifghts at pod level for a given service, based on their age? (or is something like this available today)?
Thanks for logging this issue.
This sounds like a time where health checks from Contour or readiness checks from Kubernetes would help.
Kubernetes supports pod readiness checks, and Contour supports endpoint health checks, both of which could ensure that traffic does not get to a warmed instance, as long as your application can indicate that it's ready somehow.
Contour's endpoint health checks are only available in the HTTPProxy object ( and the now deprecated IngressRoute), however. Pod readiness checks are available in any recent version of Kubernetes.
Thanks, @youngnick. This sounds like we need to warm up the new pods ourselves. The issue was asking whether this could be handled by Contour/Envoy itself, by doing a progressive traffic increase on the new pod(s), hence warming up the instance.
I agree with what @youngnick suggested. You could have your readiness probe call an endpoint which would trigger the app to warm up, but put an initial delay that matches the time your app needs to spin up.
Additionally, you could look at adding a retry to the requests, so if the request does fail, then it would get retried by Envoy.
I'm going to close this out, but please re-open if you have further questions on this @costimuraru !
Thanks for the response, @stevesloka
have your readiness probe call an endpoint which would trigger the app to warm up
I think we might not be on the same page regarding the warm up. The warm up is not related to the application being slow to start or anything like that. This is about the app warming up by processing (real) HTTP requests.
The scenario right now with Contour is:
- app starts on the new pod and is ready to handle requests (this happens quite fast)
- contour throws a lot of requests to the new pod
- app can't handle these many requests at once, being in a cold state and crashes
This problem is known and other load balancers have implemented algorithms to mitigate it. For example see this from the Application Load Balancer from AWS: https://aws.amazon.com/about-aws/whats-new/2018/05/application-load-balancer-announces-slow-start-support/
Application Load Balancers now support a slow start mode that allows you to add new targets without overwhelming them with a flood of requests. With the slow start mode, targets warm up before accepting their fair share of requests based on a ramp-up period that you specify
This issue is related exactly to this kind of behavior, where Contour would be able to support a slow start mode and not overwhelm new pods with requests.
Hey, @youngnick, @stevesloka,
Any thoughts on the above?
Appreciate the feedback.
Hi @costimuraru, currently, Contour does minimal configuration of Envoy aside from what it's directed to do by Kubernetes objects.
If I understand what you're asking for - having Contour detect new endpoint pods and gradually shift traffic to them - this is a very large change to Contour's current model of using Envoy, as it would involve Contour keeping track of all the health of all the endpoints of the service, and gradually changing the weights of each endpoint after a given period, which is a very large departure from our current model.
I will speak to the team about this idea, we will need to double check if Envoy has any feature that would make adding this feature to Contour easier.
In addition, I think what @stevesloka and I were trying to suggest earlier is having the readiness check do some common requests to the app itself to warm the caches before marking the pod as ready for traffic.
Thanks for the detailed answer, @youngnick!
In addition, I think what @stevesloka and I were trying to suggest earlier is having the readiness check do some common requests to the app itself to warm the caches before marking the pod as ready for traffic.
We tried this, but the number of requests is just too low to do any real warming (we're trying to warm up from 0 to ~4000 requests per second, for each pod). We also tried adding a PostStart lifecycle hook on the Pod, where we'd run an http generator process to send requests to the app (via localhost), but this also is a problematic. The warm up takes quite a bit of time (eg. ~ 2 minutes), during which the Pod is not actually receiving any external traffic. Even if we add tens of pods due to a spike, we are not able to process the extra requests, because we need for this warm up period to finish (so we're back to the VM world, where it takes minutes to spin up a new machine). It's also quite hard to generate requests that map to real life use cases, as these are frequently getting updated. All in one, doing this warmup workarounds add quite a lot of work and don't yield the best results.
@costimuraru - this is more an Envoy issue in my mind (Contour could leverage that feature of course, once implemented in Envoy). Have you considered filing the issue in the Envoy project instead?
Thanks, @lrouquette. Created the issue in Envoy: https://github.com/envoyproxy/envoy/issues/11050
This is available in Envoy now so Contour could adopt the feature!
From slack convo:
We'd need to just plan out the API features of how to implement. Probably would need to add to the services struct and add the slow-startup configuration: https://github.com/projectcontour/contour/blob/main/apis/projectcontour/v1/httpproxy.go#L627
cc @CrossingTheRiverPeole
Added the help wanted label here if anyone is interested in picking up this issue!
It would be very useful for us to have support for this new Envoy feature in Contour.
Thanks a lot for this !!
@skriss If I understand the Compatibility matrix correctly, this means that this change would get rolled in the next major release (1.23.0 ??) and the minimum supported K8s version for this release will be 1.23. Is this correct?
yes that is correct