autoscaler
autoscaler copied to clipboard
VPA Doesn't respect the minAllowed and maxAllowed
Which component are you using?: VPA
What version of the component are you using?: 0.10.0
Component version:
What k8s version are you using (kubectl version
)?:
kubectl version
Output
$ kubectl version Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
What environment is this in?: EKS Cluster with t2.medium instances AWS.
What did you expect to happen?: VPA Respects the minAllowed and maxAllowed params to set recommendations.
What happened instead?: VPA recommends values of cpu and memory over the maxAllowed param.
Resource Policy:
Container Policies:
Container Name: '*'
Controlled Resources:
cpu
memory
Controlled Values: RequestsAndLimits
Max Allowed:
Cpu: 1000m
Memory: 1Gi
Min Allowed:
Cpu: 50m
Memory: 64Mi
Target Ref:
API Version: apps/v1
Kind: StatefulSet
Name: couchdb-couchdb
Update Policy:
Update Mode: Auto
Status:
Conditions:
Last Transition Time: 2022-03-23T11:39:19Z
Status: True
Type: RecommendationProvided
Recommendation:
Container Recommendations:
Container Name: couchdb
Lower Bound:
Cpu: 1399m
Memory: 254764315
Target:
Cpu: 1482m
Memory: 280521325
Uncapped Target:
Cpu: 1482m
Memory: 280521325
Upper Bound:
Cpu: 1534m
Memory: 318350751
Events: <none>
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Full VPA Description to note minAllowed
and maxAllowed
values, and target recommendations and other relevant things.
Name: couchdb-couchdb
Namespace: couchdb
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: couchdb
meta.helm.sh/release-namespace: couchdb
API Version: autoscaling.k8s.io/v1
Kind: VerticalPodAutoscaler
Metadata:
Creation Timestamp: 2022-03-23T11:38:17Z
Generation: 69
Managed Fields:
API Version: autoscaling.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:meta.helm.sh/release-name:
f:meta.helm.sh/release-namespace:
f:labels:
.:
f:app.kubernetes.io/managed-by:
f:spec:
.:
f:resourcePolicy:
.:
f:containerPolicies:
f:targetRef:
.:
f:apiVersion:
f:kind:
f:name:
f:updatePolicy:
.:
f:updateMode:
Manager: terraform-provider-helm_v2.4.1_x5
Operation: Update
Time: 2022-03-23T11:38:17Z
API Version: autoscaling.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:conditions:
f:recommendation:
.:
f:containerRecommendations:
Manager: recommender
Operation: Update
Time: 2022-03-23T11:39:19Z
Resource Version: 9449838
UID: d91e4836-559c-4699-b0bf-238bb8316640
Spec:
Resource Policy:
Container Policies:
Container Name: '*'
Controlled Resources:
cpu
memory
Controlled Values: RequestsAndLimits
Max Allowed:
Cpu: 1000m
Memory: 1Gi
Min Allowed:
Cpu: 50m
Memory: 64Mi
Target Ref:
API Version: apps/v1
Kind: StatefulSet
Name: couchdb-couchdb
Update Policy:
Update Mode: Auto
Status:
Conditions:
Last Transition Time: 2022-03-23T11:39:19Z
Status: True
Type: RecommendationProvided
Recommendation:
Container Recommendations:
Container Name: couchdb
Lower Bound:
Cpu: 1399m
Memory: 254764315
Target:
Cpu: 1482m
Memory: 280521325
Uncapped Target:
Cpu: 1482m
Memory: 280521325
Upper Bound:
Cpu: 1534m
Memory: 318350751
Events: <none>
VPA is working here as intended. VPA recommender saves recommendation values based on usage it has seen. Then Updater and Admission controller apply changes to those recommendations (capping them to stay between allowed Min & Max, fit into limit ranges).
This is because things like allowed Min / Max and Limit Ranges can change but historical data doesn't.
Please let me know if this explanation helps (or if you think VPA should behave differently than it does).
I can see that this can be confusing though. Contributions to improve the situation with better documentation or in other ways are very welcome.
Then Updater and Admission controller apply changes to those recommendations (capping them to stay between allowed Min & Max, fit into limit ranges).
This is the problem, the Updater and Admission controller doesn't respect the limits. They apply the recommendations even those outside of minAllowed
and maxAllowed
.
Contributions to improve the situation with better documentation or in other ways are very welcome.
Sure, I'm inspecting the situation and I'll comment with more details. Thanks.
Then Updater and Admission controller apply changes to those recommendations (capping them to stay between allowed Min & Max, fit into limit ranges).
This is the problem, the Updater and Admission controller doesn't respect the limits. They apply the recommendations even those outside of
minAllowed
andmaxAllowed
.
Ack. Then this is a problem. I think we have e2e tests for that (but right now all e2e are failing #4764).
I suspect this may be the same issue as #4733.
Please check if your webhook is using v1 API. You can do this by running:
kubectl get MutatingWebhookConfiguration vpa-webhook-config -oyaml
If the webhook is using v1beta2
API then you should switch to using v1. When Admission Controller starts it by default deletes existing webhook and creates a new one (properly configured). So if you have v1beta2 webhook I think there are two possibilities:
- You're running VPA with registerWebhook flag set to false
- Something is changing webhook after VPA Admission Controller updates it.
Please let me know if this helps.
You're running VPA with registerWebhook flag set to false
Yes, we are definitely doing that, since we need better control over the webhook registration. The point is that it is currently not obvious from the release notes of 0.10.0 that in this case we must update the webhook registration to v1
as well when updating to 0.10.0, so we had to find it the hard way.
You're running VPA with registerWebhook flag set to false
Yes, we are definitely doing that, since we need better control over the webhook registration. The point is that it is currently not obvious from the release notes of 0.10.0 that in this case we must update the webhook registration to
v1
as well when updating to 0.10.0, so we had to find it the hard way.
I'll improve description of the release.
Did updating the webhook solve this issue?
Did updating the webhook solve this issue?
Sorry, I wanted to update #4733, not this issue. I will test for this particular one as well.
I am not able to reproduce this issue in my test setup after #4733 is fixed.
Please check if your webhook is using v1 API. You can do this by running:
kubectl get MutatingWebhookConfiguration vpa-webhook-config -oyaml
I commented here the output of the command and webhook is using v1
. This situation and #4759 still persists even after removing the webhook and recreate them by deploy VPA again.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Hi!
I have some updates. I think that the issue is with the value of containerName
parameter.
Since I used Terraform to deploy VPA (and pass the values through it). When I'm execute the command kubectl get vpa couchdb-couchdb -oyaml
I noted that the value of containerName is '''*'''
and that's not correct (my error in terraform values).
So, this value in containerName trigger that VPA doesn't respect the controlledValues
, maxAllowed
and minAllowed
configuration.
To solve the issue, I edit the vpa
api object with kubectl edit vpa couchdb-couchdb
and fix the value in containerName
to '*'
. After that the behaviour is the expected.
If you want to try this behaviour, try to edit the VPA Api object changing the value of containerName to some like '''*'''
and check the issues that I mentioned.
I think that despite the error is in my terraform values, VPA should advert that the value in containerName is wrong and not create the API Object.
Same response to #4759
Another strange thing is that the output of kubectl get vpa couchdb-couchdb
and kubectl describe vpa couchdb-couchdb
throws distinct values in containerName
parameter ('''*'''
and '*'
respectively) so it might confuse.
/remove-lifecycle stale
I have the VerticalPodAutoscaler
containerName
specified verbatim the same as the pod's container
- name
, and my minAllowed
- memory
is not listed when I issue: kubectl get verticalpodautoscaler caching-server-autoscaler -o yaml
Combined with this: https://github.com/giantswarm/roadmap/issues/923 , it seems to be quite difficult to get enough memory.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.