autoscaler
autoscaler copied to clipboard
Extended resources provided by ASG via tags is not working
Which component are you using?: autoscaler
What version of the component are you using?: 1.25.0-alpha.0 AND 1.23.1
What k8s version are you using (kubectl version
)?:
kubectl version
Output
$ kubectl version Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-04-13T19:57:43Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.9", GitCommit:"c1de2d70269039fe55efb98e737d9a29f9155246", GitTreeState:"clean", BuildDate:"2022-07-13T14:19:57Z", GoVersion:"go1.17.11", Compiler:"gc", Platform:"linux/amd64"}
What environment is this in?: sanbox ( AWS )
What did you expect to happen?:
I tried to use extended resource defined as tag in ASG according documentation for AWS it should be k8s.io/cluster-autoscaler/node-template/resources/<resource-name>
https://github.com/kubernetes/autoscaler/blob/cluster-autoscaler-1.23.1/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup . This sould work at least for node-group-auto-discovery mode. Is anybody successfully using it?
What happened instead?: This tag is never read or ignored. CA is still complaining about insufficient resource and not scaling up.
How to reproduce it (as minimally and precisely as possible):
Add extended resource as descirbe in https://kubernetes.io/docs/tasks/configure-pod-container/extended-resource/
Add k8s.io/cluster-autoscaler/node-template/resources/<resource-name>
tag with same reasonable value to ASG
Anything else we need to know?: I did some tests, it looks like this is never executed https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L412
We're seeing the same result in our end, but for labels k8s.io/cluster-autoscaler/node-template/labels/<labels-name>
.
During an upgrade from 1.22 to 1.23 on EKS, and our Cluster Autoscaler 1.23 has the same results, but on 1.22 we didn't have this problem
The only thing that I see that changed here, may be PR #4238 that plays around with labels? I never checked this code in depth before, but the extractAutoscalingOptionsFromTags
has a different approach to the usual.
Maybe that breaks something? Just posting in case it helps whoever takes this
@ZimmSebas sound similar but its kind of different bug, yours could be related to https://github.com/kubernetes/autoscaler/pull/4238
Bug that I'm describing is more complex.
If you put eg custom-resouce: 2
to pods requests/limits, scaling up end up here
https://github.com/kubernetes/autoscaler/blob/c38cc7460426b80ad60e63b7647f2c973a4e3878/cluster-autoscaler/core/scale_up.go#L463
as inside computeExpansionOption()
predicates will fail
https://github.com/kubernetes/autoscaler/blob/c38cc7460426b80ad60e63b7647f2c973a4e3878/cluster-autoscaler/core/scale_up.go#L446
with predicate checking error: Insufficient custom-resouce;
so it will never reach https://github.com/kubernetes/autoscaler/blob/c38cc7460426b80ad60e63b7647f2c973a4e3878/cluster-autoscaler/core/scale_up.go#L509 func which in the end tries to extract k8s.io/cluster-autoscaler/node-template/resources
annotation
I spent a little while trying to track this down and couldn't figure out how to repro. I know we are using k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage
on our clusters and it is working as expected. We're also on a slightly older version of cluster autoscaler. Is this a regression in behaviour, or has this always been broken? I'm not sure.
@drmorr0 which version and which cloudprovider?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale