autoscaler Discovering node's taints from ASG tags in Alicloud / Merging changes from and internal fork

Discovering node's taints from ASG tags in Alicloud / Merging changes from and internal fork

Open teqwve opened this issue 2 years ago • 9 comments

Which component are you using?:

A cluster-autoscaler running on Alicloud

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Inheriting node's taints from tags of ASG. At the moment it's not possible to hint cluster-autoscaler about taints a node will have using tags of an ASG, it's only possible to hint about node's labels (you can see it here). It would be nice to have the same options as with AWS provider.

Describe the solution you'd like.:

The change is simple and I'd be very happy to provide the PR myself, but:

Alibaba has an internal fork of cluster-autoscaler that has already implemented the feature I'd like to have (see notes in context)
last time such PR was provided it was rejected and a codeowner of Alicloud provider synchronized changes from their internal fork instead (PRs: https://github.com/kubernetes/autoscaler/pull/1719 and https://github.com/kubernetes/autoscaler/pull/1723)

So instead I'd like to ask Alicloud provider codeowner (@ringtail from what I've found in OWNERS file) if it's possible to synchronize changes from the internal fork to this repository. If no, could I provide a PR that would add support for taints and hope it will be accepted?

Describe any alternative solutions you've considered.:

One solution is to use an internal version of autoscaler deployed by Alibaba cloud, but it's quite problematic:

I can't use the same version of cluster-autoscaler as on other clouds
it seems like this internal fork is based on an old version of cluster-autoscaler (internal tag is v1.3.1-7369cf1) and e.g. scaling events aren't visible in pending Pods' descriptions (there are errors like E1229 11:03:27.374533 1 event.go:259] Could not construct reference to: '&v1.Pod(...)' due to: 'selfLink was empty, can't make reference'. Will not report event), I can also imagine that there are bug fixes in this repository that haven't been backported to the internal fork

Additional context.:

The internal fork is deployed by alicloud_cs_kubernetes_autoscaler terraform resource in Alicloud terraform provider maintained by Alibaba cloud If one looks at it's source code (available here) it becomes clear that internal fork has support for taints which haven't been merged to this repository.

I think that these constants (here) are self-descriptive:

const (
	(...)
	defaultAutoscalerImage     = "registry-vpc.%s.aliyuncs.com/acs/autoscaler:v1.3.1-7369cf1"
	LabelPattern               = "k8s.io/cluster-autoscaler/node-template/label/"
	TaintPattern               = "k8s.io/cluster-autoscaler/node-template/taint/"
)

I've configured cluster-autoscaler like this and I can confirm it's working as one would expect :)

Dec 31 '21 10:12 teqwve

To be precise I'd like to add that there is also a 'managed' version of cluster-autoscaler (documented here) that deploys a different tag (last time I checked the v1.3.1 part was the same, hash was different) than the terraform provider does. I ended up using this tag with ASG tag patterns the same as in the terraform provider. I can't remember the exact reasons now (the tag from terraform was incompatible with new kubernetes api, or something like this).

Dec 31 '21 10:12 teqwve

@teqwve Sure. I'll check it.

Jan 03 '22 07:01 ringtail

CC @IrisIris

Feb 11 '22 08:02 ringtail

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 12 '22 09:05 k8s-triage-robot

Hi, @ringtail, are there any chances of doing this?

If you don't have time, I will be very happy to provide a patch with taints support myself (but it would differ from internal fork in this case).

May 27 '22 16:05 teqwve

/remove-lifecycle stale

May 27 '22 16:05 teqwve

@IrisIris pls follow it up.

May 31 '22 06:05 ringtail

Hi! I see you might not have sufficient resources, I'll try to provide a PR for changing this myself. Is that ok?

Aug 04 '22 09:08 teqwve

Hi! I see you might not have sufficient resources, I'll try to provide a PR for changing this myself. Is that ok?

sure, thanks man.

Aug 05 '22 02:08 ringtail

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 03 '22 03:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Dec 03 '22 03:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Jan 02 '23 04:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jan 02 '23 04:01 k8s-ci-robot

autoscaler autoscaler copied to clipboard

Discovering node's taints from ASG tags in Alicloud / Merging changes from and internal fork

autoscaler
autoscaler copied to clipboard