autoscaler
autoscaler copied to clipboard
Consider not marking Recreate / Auto VPA modes as experimental
Which component are you using?:
VPA
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
VPA Recreate and Auto modes are marked as experimental in the documentation. They've been around for a while so we should consider not saying they're experimental anymore.
Describe the solution you'd like.:
Check if they're ready & either improve those modes or stop saying they're experimental.
Describe any alternative solutions you've considered.:
Additional context.:
I may be missing something obvious, but I simply deployed the hamster example and the VPA mode seems to be set to Auto by default without ever specifying it:
hack/vpa-up.sh
k apply -f examples/hamster.yaml
...
kubectl get vpa
NAME MODE CPU MEM PROVIDED AGE
hamster-vpa Auto 587m 262144k True 11m
how can a mode be default AND experimental at the same time? should the NOTE just get removed from the docs?
That's a good point.
We're setting Auto as default here and we've been doing that for a long time (blame says the line is from 2020 but from PR it looks like we've been doing that before, PR just moved the code), description of the field says the same. It's unusual to say that the default behavior is experimental.
has Auto mode been experimented enough such that it's not experimental anymore? what do you suggest should be the default mode?
I was playing around with the "Initial" mode but the updater pod angrily complained:
I1013 16:14:31.998822 1 updater.go:137] skipping VPA object hamster-vpa because its mode is not "Recreate" or "Auto"
None would be skipped too, Recreate doesn't sound right for some reason...
I think that Auto:
- has been around for a long time,
- has been the default for years,
- I think has been used by a lot of people.
So I think we should stop saying it's experimental. But I want to make sure I'm not missing anything important.
I'm looking for why we say it's experiments. I found that https://github.com/kubernetes/autoscaler/pull/1509 added the README note back in 2018-12, I also see it was marked as experimental in 2018 Kubecon presentation. I heard that it's because VPA evicts pods but can't guarantee they'll come back but didn't find any artifacts confirming that.
here it states that Auto is Recreate in reality, does this imply that it's not possible to hot add resources to pods (without the need to restart them)?
and the v1 package (and not v1beta) seems to suggest something non-experimental about them...
does this imply that it's not possible to hot add resources to pods (without the need to restart them)?
In kubernetes, it's impossible. (and there is no such discussion AFAIK) I'd say the point is why it is marked as an experimental feature and has the problem already resolved (or can we regard the problem as accepted) as @jbartosik said.
I heard that it's because VPA evicts pods but can't guarantee they'll come back but didn't find any artifacts confirming that.
I'm completely not familiar with the implementation detail of VPA, but I believe we cannot make 100% sure a new Pod to replace an old evicted Pod gets scheduled. It's because, for example, if VPA's new recommended request value for Pods is 10 CPU, but no Node in the cluster has 10 CPU, then the scheduler cannot find the place for the new Pod. (until someone like CA creates a Node for it)
does this imply that it's not possible to hot add resources to pods (without the need to restart them)?
In kubernetes, it's impossible. (and there is no such discussion AFAIK) I'd say the point is why it is marked as an experimental feature and has the problem already resolved (or can we regard the problem as accepted) as Joachim said.
Currently it's not possible but there's ongoing work to make that possible (https://github.com/kubernetes/kubernetes/pull/102884) and to follow up by using that ability in VPA (https://github.com/kubernetes/autoscaler/issues/5046)
I heard that it's because VPA evicts pods but can't guarantee they'll come back but didn't find any artifacts confirming that.
I'm completely not familiar with the implementation detail of VPA, but I believe we cannot make 100% sure a new Pod to replace an old evicted Pod gets scheduled. It's because, for example, if VPA's new recommended request value for Pods is 10 CPU, but no Node in the cluster has 10 CPU, then the scheduler cannot find the place for the new Pod. (until someone like CA creates a Node for it)
VPAs README says so (here)
there's ongoing work to make that possible
Cool, I didn't know that.
When recreated Pod can not come back, we can guess users want either of the following:
- recreate Pods anyway (the current behavior), and make CA create a new Node to run recreated Pod.
- stop recreating in that case.
To deal with (2), how about running precheck if recreated Pods can be scheduled like descheduler. https://github.com/kubernetes-sigs/descheduler#node-fit-filtering I know we can't make 100% sure because the state of the cluster is constantly changing, but I'd think it would prevent, to some extent, a situation where Pod could not come back. And, we need to consider some users will still want (1), so this behavior should be an optional feature.
We have been using the VPA with Recreate ever since we introduced VPA – no need to keep it as 'experimental' in my opinion. Especially now that that Auto/In-place will come soon and most likely this will get the notion of 'experimental' in the beginning. So when comparing these two modes, Recreate is definitely more mature and this should also reflect in the docs.
To clarify my position, I don't mean to be against keeping to mark it as experimental. Rather I agree with not keeping it as experimental. I just mean that if we still need to be concerned about the possibility that recreated Pods won't come back, it is one of the ideas to deal with that.
will Auto/In-place be the default mode to be used (unless the user specifies otherwise) but at the same time it will be marked as 'experimental'?
@sanposhiho thanks for the suggestion. I don't think I have capacity to work on that in the near future but it's good to have it in case we can work on the problem.
@mikelo
will Auto/In-place be the default mode to be used (unless the user specifies otherwise) but at the same time it will be marked as 'experimental'?
That's the situation now. Our documentation says that Auto/Recreate is experimental. But it's also the default.
I want to change this. Auto will still be the default but we will no longer say that it's experimental.