autoscaler
autoscaler copied to clipboard
Discovering node's taints from ASG tags in Alicloud / Merging changes from and internal fork
Which component are you using?:
A cluster-autoscaler running on Alicloud
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
Inheriting node's taints from tags of ASG. At the moment it's not possible to hint cluster-autoscaler about taints a node will have using tags of an ASG, it's only possible to hint about node's labels (you can see it here). It would be nice to have the same options as with AWS provider.
Describe the solution you'd like.:
The change is simple and I'd be very happy to provide the PR myself, but:
-
Alibaba has an internal fork of cluster-autoscaler that has already implemented the feature I'd like to have (see notes in context)
-
last time such PR was provided it was rejected and a codeowner of Alicloud provider synchronized changes from their internal fork instead (PRs: https://github.com/kubernetes/autoscaler/pull/1719 and https://github.com/kubernetes/autoscaler/pull/1723)
So instead I'd like to ask Alicloud provider codeowner (@ringtail from what I've found in OWNERS
file) if it's possible to synchronize changes from the internal fork to this repository. If no, could I provide a PR that would add support for taints and hope it will be accepted?
Describe any alternative solutions you've considered.:
One solution is to use an internal version of autoscaler deployed by Alibaba cloud, but it's quite problematic:
- I can't use the same version of cluster-autoscaler as on other clouds
- it seems like this internal fork is based on an old version of cluster-autoscaler (internal tag is
v1.3.1-7369cf1
) and e.g. scaling events aren't visible in pending Pods' descriptions (there are errors likeE1229 11:03:27.374533 1 event.go:259] Could not construct reference to: '&v1.Pod(...)' due to: 'selfLink was empty, can't make reference'. Will not report event
), I can also imagine that there are bug fixes in this repository that haven't been backported to the internal fork
Additional context.:
The internal fork is deployed by alicloud_cs_kubernetes_autoscaler
terraform resource in Alicloud terraform provider maintained by Alibaba cloud If one looks at it's source code (available here) it becomes clear that internal fork has support for taints which haven't been merged to this repository.
I think that these constants (here) are self-descriptive:
const (
(...)
defaultAutoscalerImage = "registry-vpc.%s.aliyuncs.com/acs/autoscaler:v1.3.1-7369cf1"
LabelPattern = "k8s.io/cluster-autoscaler/node-template/label/"
TaintPattern = "k8s.io/cluster-autoscaler/node-template/taint/"
)
I've configured cluster-autoscaler like this and I can confirm it's working as one would expect :)
To be precise I'd like to add that there is also a 'managed' version of cluster-autoscaler (documented here) that deploys a different tag (last time I checked the v1.3.1
part was the same, hash was different) than the terraform provider does. I ended up using this tag with ASG tag patterns the same as in the terraform provider. I can't remember the exact reasons now (the tag from terraform was incompatible with new kubernetes api, or something like this).
@teqwve Sure. I'll check it.
CC @IrisIris
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Hi, @ringtail, are there any chances of doing this?
If you don't have time, I will be very happy to provide a patch with taints support myself (but it would differ from internal fork in this case).
/remove-lifecycle stale
@IrisIris pls follow it up.
Hi! I see you might not have sufficient resources, I'll try to provide a PR for changing this myself. Is that ok?
Hi! I see you might not have sufficient resources, I'll try to provide a PR for changing this myself. Is that ok?
sure, thanks man.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.