charts
charts copied to clipboard
[stable/vpa] Probe defaults don't make sense
What happened?
The VPA chart sets readinessProbe
and livenessProbe
for several containers. Here are the values for the recommender
:
https://github.com/FairwindsOps/charts/blob/1dbb322b01d207ba2a08c2ae25158051d2d54208/stable/vpa/values.yaml#L88-L107
Both are essentially the same, just differ in their failureThreshold
values. Those values are the problem I see: If 6
failed liveness probes lead to restarting the container, there is no way for the container to ever become unready after 120
readiness probes, as the restart happens way earlier.
The behavior has been like this since the probes were added in https://github.com/FairwindsOps/charts/pull/399.
In a scenario with a couple of thousand VPA resources I've seen the recommender being restarted all the time because its liveness probes failed, as the container wasn't done with its startup.
What did you expect to happen?
A failureThreshold
of 120
for the readinessProbe
seems quite high. I wonder if it's rather meant to be a startupProbe
. That high failureThreshold
allows quite some time for the container to come up before the livenessProbe
then takes over.
How can we reproduce this?
Create lots of VPA resources to slow down the startup of the VPA's recommender pod. It then gets restarted due to the failing liveness probe.
Version
4.5.0
Search
- [X] I did search for other open and closed issues before opening this.
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Additional context
No response