autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

Consider not marking Recreate / Auto VPA modes as experimental

Open jbartosik opened this issue 3 years ago • 5 comments

Which component are you using?:

VPA

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.: VPA Recreate and Auto modes are marked as experimental in the documentation. They've been around for a while so we should consider not saying they're experimental anymore.

Describe the solution you'd like.:

Check if they're ready & either improve those modes or stop saying they're experimental.

Describe any alternative solutions you've considered.:

Additional context.:

jbartosik avatar Oct 11 '22 08:10 jbartosik

I may be missing something obvious, but I simply deployed the hamster example and the VPA mode seems to be set to Auto by default without ever specifying it:

hack/vpa-up.sh
k apply -f examples/hamster.yaml
...
kubectl get vpa
NAME          MODE   CPU    MEM       PROVIDED   AGE
hamster-vpa   Auto   587m   262144k   True       11m

how can a mode be default AND experimental at the same time? should the NOTE just get removed from the docs?

mikelo avatar Oct 13 '22 09:10 mikelo

That's a good point.

We're setting Auto as default here and we've been doing that for a long time (blame says the line is from 2020 but from PR it looks like we've been doing that before, PR just moved the code), description of the field says the same. It's unusual to say that the default behavior is experimental.

jbartosik avatar Oct 13 '22 10:10 jbartosik

has Auto mode been experimented enough such that it's not experimental anymore? what do you suggest should be the default mode?

I was playing around with the "Initial" mode but the updater pod angrily complained:

I1013 16:14:31.998822 1 updater.go:137] skipping VPA object hamster-vpa because its mode is not "Recreate" or "Auto"

None would be skipped too, Recreate doesn't sound right for some reason...

mikelo avatar Oct 13 '22 16:10 mikelo

I think that Auto:

  • has been around for a long time,
  • has been the default for years,
  • I think has been used by a lot of people.

So I think we should stop saying it's experimental. But I want to make sure I'm not missing anything important.

I'm looking for why we say it's experiments. I found that https://github.com/kubernetes/autoscaler/pull/1509 added the README note back in 2018-12, I also see it was marked as experimental in 2018 Kubecon presentation. I heard that it's because VPA evicts pods but can't guarantee they'll come back but didn't find any artifacts confirming that.

jbartosik avatar Oct 14 '22 08:10 jbartosik

here it states that Auto is Recreate in reality, does this imply that it's not possible to hot add resources to pods (without the need to restart them)?

and the v1 package (and not v1beta) seems to suggest something non-experimental about them...

mikelo avatar Oct 14 '22 14:10 mikelo

does this imply that it's not possible to hot add resources to pods (without the need to restart them)?

In kubernetes, it's impossible. (and there is no such discussion AFAIK) I'd say the point is why it is marked as an experimental feature and has the problem already resolved (or can we regard the problem as accepted) as @jbartosik said.

I heard that it's because VPA evicts pods but can't guarantee they'll come back but didn't find any artifacts confirming that.

I'm completely not familiar with the implementation detail of VPA, but I believe we cannot make 100% sure a new Pod to replace an old evicted Pod gets scheduled. It's because, for example, if VPA's new recommended request value for Pods is 10 CPU, but no Node in the cluster has 10 CPU, then the scheduler cannot find the place for the new Pod. (until someone like CA creates a Node for it)

sanposhiho avatar Oct 20 '22 12:10 sanposhiho

does this imply that it's not possible to hot add resources to pods (without the need to restart them)?

In kubernetes, it's impossible. (and there is no such discussion AFAIK) I'd say the point is why it is marked as an experimental feature and has the problem already resolved (or can we regard the problem as accepted) as Joachim said.

Currently it's not possible but there's ongoing work to make that possible (https://github.com/kubernetes/kubernetes/pull/102884) and to follow up by using that ability in VPA (https://github.com/kubernetes/autoscaler/issues/5046)

I heard that it's because VPA evicts pods but can't guarantee they'll come back but didn't find any artifacts confirming that.

I'm completely not familiar with the implementation detail of VPA, but I believe we cannot make 100% sure a new Pod to replace an old evicted Pod gets scheduled. It's because, for example, if VPA's new recommended request value for Pods is 10 CPU, but no Node in the cluster has 10 CPU, then the scheduler cannot find the place for the new Pod. (until someone like CA creates a Node for it)

VPAs README says so (here)

jbartosik avatar Oct 25 '22 08:10 jbartosik

there's ongoing work to make that possible

Cool, I didn't know that.

sanposhiho avatar Oct 25 '22 14:10 sanposhiho

When recreated Pod can not come back, we can guess users want either of the following:

  1. recreate Pods anyway (the current behavior), and make CA create a new Node to run recreated Pod.
  2. stop recreating in that case.

To deal with (2), how about running precheck if recreated Pods can be scheduled like descheduler. https://github.com/kubernetes-sigs/descheduler#node-fit-filtering I know we can't make 100% sure because the state of the cluster is constantly changing, but I'd think it would prevent, to some extent, a situation where Pod could not come back. And, we need to consider some users will still want (1), so this behavior should be an optional feature.

sanposhiho avatar Oct 30 '22 08:10 sanposhiho

We have been using the VPA with Recreate ever since we introduced VPA – no need to keep it as 'experimental' in my opinion. Especially now that that Auto/In-place will come soon and most likely this will get the notion of 'experimental' in the beginning. So when comparing these two modes, Recreate is definitely more mature and this should also reflect in the docs.

voelzmo avatar Oct 31 '22 10:10 voelzmo

To clarify my position, I don't mean to be against keeping to mark it as experimental. Rather I agree with not keeping it as experimental. I just mean that if we still need to be concerned about the possibility that recreated Pods won't come back, it is one of the ideas to deal with that.

sanposhiho avatar Nov 01 '22 08:11 sanposhiho

will Auto/In-place be the default mode to be used (unless the user specifies otherwise) but at the same time it will be marked as 'experimental'?

mikelo avatar Nov 02 '22 08:11 mikelo

@sanposhiho thanks for the suggestion. I don't think I have capacity to work on that in the near future but it's good to have it in case we can work on the problem.

@mikelo

will Auto/In-place be the default mode to be used (unless the user specifies otherwise) but at the same time it will be marked as 'experimental'?

That's the situation now. Our documentation says that Auto/Recreate is experimental. But it's also the default.

I want to change this. Auto will still be the default but we will no longer say that it's experimental.

jbartosik avatar Nov 02 '22 12:11 jbartosik