autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

VPA Doesn't respect the minAllowed and maxAllowed

Open fsalazarh opened this issue 2 years ago • 13 comments

Which component are you using?: VPA

What version of the component are you using?: 0.10.0

Component version:

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?: EKS Cluster with t2.medium instances AWS.

What did you expect to happen?: VPA Respects the minAllowed and maxAllowed params to set recommendations.

What happened instead?: VPA recommends values of cpu and memory over the maxAllowed param.

  Resource Policy:
    Container Policies:
      Container Name:  '*'
      Controlled Resources:
        cpu
        memory
      Controlled Values:  RequestsAndLimits
      Max Allowed:
        Cpu:     1000m
        Memory:  1Gi
      Min Allowed:
        Cpu:     50m
        Memory:  64Mi
  Target Ref:
    API Version:  apps/v1
    Kind:         StatefulSet
    Name:         couchdb-couchdb
  Update Policy:
    Update Mode:  Auto
Status:
  Conditions:
    Last Transition Time:  2022-03-23T11:39:19Z
    Status:                True
    Type:                  RecommendationProvided
  Recommendation:
    Container Recommendations:
      Container Name:  couchdb
      Lower Bound:
        Cpu:     1399m
        Memory:  254764315
      Target:
        Cpu:     1482m
        Memory:  280521325
      Uncapped Target:
        Cpu:     1482m
        Memory:  280521325
      Upper Bound:
        Cpu:     1534m
        Memory:  318350751
Events:          <none>

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: Full VPA Description to note minAllowed and maxAllowed values, and target recommendations and other relevant things.

Name:         couchdb-couchdb
Namespace:    couchdb
Labels:       app.kubernetes.io/managed-by=Helm
Annotations:  meta.helm.sh/release-name: couchdb
              meta.helm.sh/release-namespace: couchdb
API Version:  autoscaling.k8s.io/v1
Kind:         VerticalPodAutoscaler
Metadata:
  Creation Timestamp:  2022-03-23T11:38:17Z
  Generation:          69
  Managed Fields:
    API Version:  autoscaling.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:meta.helm.sh/release-name:
          f:meta.helm.sh/release-namespace:
        f:labels:
          .:
          f:app.kubernetes.io/managed-by:
      f:spec:
        .:
        f:resourcePolicy:
          .:
          f:containerPolicies:
        f:targetRef:
          .:
          f:apiVersion:
          f:kind:
          f:name:
        f:updatePolicy:
          .:
          f:updateMode:
    Manager:      terraform-provider-helm_v2.4.1_x5
    Operation:    Update
    Time:         2022-03-23T11:38:17Z
    API Version:  autoscaling.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
        f:recommendation:
          .:
          f:containerRecommendations:
    Manager:         recommender
    Operation:       Update
    Time:            2022-03-23T11:39:19Z
  Resource Version:  9449838
  UID:               d91e4836-559c-4699-b0bf-238bb8316640
Spec:
  Resource Policy:
    Container Policies:
      Container Name:  '*'
      Controlled Resources:
        cpu
        memory
      Controlled Values:  RequestsAndLimits
      Max Allowed:
        Cpu:     1000m
        Memory:  1Gi
      Min Allowed:
        Cpu:     50m
        Memory:  64Mi
  Target Ref:
    API Version:  apps/v1
    Kind:         StatefulSet
    Name:         couchdb-couchdb
  Update Policy:
    Update Mode:  Auto
Status:
  Conditions:
    Last Transition Time:  2022-03-23T11:39:19Z
    Status:                True
    Type:                  RecommendationProvided
  Recommendation:
    Container Recommendations:
      Container Name:  couchdb
      Lower Bound:
        Cpu:     1399m
        Memory:  254764315
      Target:
        Cpu:     1482m
        Memory:  280521325
      Uncapped Target:
        Cpu:     1482m
        Memory:  280521325
      Upper Bound:
        Cpu:     1534m
        Memory:  318350751
Events:          <none>

fsalazarh avatar Mar 23 '22 12:03 fsalazarh

VPA is working here as intended. VPA recommender saves recommendation values based on usage it has seen. Then Updater and Admission controller apply changes to those recommendations (capping them to stay between allowed Min & Max, fit into limit ranges).

This is because things like allowed Min / Max and Limit Ranges can change but historical data doesn't.

Please let me know if this explanation helps (or if you think VPA should behave differently than it does).

I can see that this can be confusing though. Contributions to improve the situation with better documentation or in other ways are very welcome.

jbartosik avatar Mar 24 '22 09:03 jbartosik

Then Updater and Admission controller apply changes to those recommendations (capping them to stay between allowed Min & Max, fit into limit ranges).

This is the problem, the Updater and Admission controller doesn't respect the limits. They apply the recommendations even those outside of minAllowed and maxAllowed.

Contributions to improve the situation with better documentation or in other ways are very welcome.

Sure, I'm inspecting the situation and I'll comment with more details. Thanks.

fsalazarh avatar Mar 24 '22 13:03 fsalazarh

Then Updater and Admission controller apply changes to those recommendations (capping them to stay between allowed Min & Max, fit into limit ranges).

This is the problem, the Updater and Admission controller doesn't respect the limits. They apply the recommendations even those outside of minAllowed and maxAllowed.

Ack. Then this is a problem. I think we have e2e tests for that (but right now all e2e are failing #4764).

jbartosik avatar Mar 24 '22 15:03 jbartosik

I suspect this may be the same issue as #4733.

Please check if your webhook is using v1 API. You can do this by running:

kubectl get MutatingWebhookConfiguration vpa-webhook-config -oyaml

If the webhook is using v1beta2 API then you should switch to using v1. When Admission Controller starts it by default deletes existing webhook and creates a new one (properly configured). So if you have v1beta2 webhook I think there are two possibilities:

  • You're running VPA with registerWebhook flag set to false
  • Something is changing webhook after VPA Admission Controller updates it.

Please let me know if this helps.

jbartosik avatar Apr 05 '22 13:04 jbartosik

You're running VPA with registerWebhook flag set to false

Yes, we are definitely doing that, since we need better control over the webhook registration. The point is that it is currently not obvious from the release notes of 0.10.0 that in this case we must update the webhook registration to v1 as well when updating to 0.10.0, so we had to find it the hard way.

stoyanr avatar Apr 05 '22 13:04 stoyanr

You're running VPA with registerWebhook flag set to false

Yes, we are definitely doing that, since we need better control over the webhook registration. The point is that it is currently not obvious from the release notes of 0.10.0 that in this case we must update the webhook registration to v1 as well when updating to 0.10.0, so we had to find it the hard way.

I'll improve description of the release.

Did updating the webhook solve this issue?

jbartosik avatar Apr 05 '22 13:04 jbartosik

Did updating the webhook solve this issue?

Sorry, I wanted to update #4733, not this issue. I will test for this particular one as well.

stoyanr avatar Apr 05 '22 13:04 stoyanr

I am not able to reproduce this issue in my test setup after #4733 is fixed.

stoyanr avatar Apr 05 '22 14:04 stoyanr

Please check if your webhook is using v1 API. You can do this by running:

kubectl get MutatingWebhookConfiguration vpa-webhook-config -oyaml

I commented here the output of the command and webhook is using v1. This situation and #4759 still persists even after removing the webhook and recreate them by deploy VPA again.

fsalazarh avatar Apr 07 '22 13:04 fsalazarh

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 13 '22 07:07 k8s-triage-robot

Hi!

I have some updates. I think that the issue is with the value of containerName parameter.

Since I used Terraform to deploy VPA (and pass the values through it). When I'm execute the command kubectl get vpa couchdb-couchdb -oyaml I noted that the value of containerName is '''*''' and that's not correct (my error in terraform values).

So, this value in containerName trigger that VPA doesn't respect the controlledValues, maxAllowed and minAllowed configuration.

To solve the issue, I edit the vpa api object with kubectl edit vpa couchdb-couchdb and fix the value in containerName to '*'. After that the behaviour is the expected.

If you want to try this behaviour, try to edit the VPA Api object changing the value of containerName to some like '''*''' and check the issues that I mentioned.

I think that despite the error is in my terraform values, VPA should advert that the value in containerName is wrong and not create the API Object.

Same response to #4759


Another strange thing is that the output of kubectl get vpa couchdb-couchdb and kubectl describe vpa couchdb-couchdb throws distinct values in containerName parameter ('''*''' and '*' respectively) so it might confuse.

fsalazarh avatar Jul 18 '22 13:07 fsalazarh

/remove-lifecycle stale

jbartosik avatar Jul 18 '22 13:07 jbartosik

I have the VerticalPodAutoscaler containerName specified verbatim the same as the pod's container - name, and my minAllowed - memory is not listed when I issue: kubectl get verticalpodautoscaler caching-server-autoscaler -o yaml

Combined with this: https://github.com/giantswarm/roadmap/issues/923 , it seems to be quite difficult to get enough memory.

ChrisChiasson avatar Jul 29 '22 01:07 ChrisChiasson

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 27 '22 01:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 26 '22 02:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jan 11 '23 11:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 11 '23 11:01 k8s-ci-robot