autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

Discovering node's taints from ASG tags in Alicloud / Merging changes from and internal fork

Open teqwve opened this issue 2 years ago • 9 comments

Which component are you using?:

A cluster-autoscaler running on Alicloud

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Inheriting node's taints from tags of ASG. At the moment it's not possible to hint cluster-autoscaler about taints a node will have using tags of an ASG, it's only possible to hint about node's labels (you can see it here). It would be nice to have the same options as with AWS provider.

Describe the solution you'd like.:

The change is simple and I'd be very happy to provide the PR myself, but:

  • Alibaba has an internal fork of cluster-autoscaler that has already implemented the feature I'd like to have (see notes in context)

  • last time such PR was provided it was rejected and a codeowner of Alicloud provider synchronized changes from their internal fork instead (PRs: https://github.com/kubernetes/autoscaler/pull/1719 and https://github.com/kubernetes/autoscaler/pull/1723)

So instead I'd like to ask Alicloud provider codeowner (@ringtail from what I've found in OWNERS file) if it's possible to synchronize changes from the internal fork to this repository. If no, could I provide a PR that would add support for taints and hope it will be accepted?

Describe any alternative solutions you've considered.:

One solution is to use an internal version of autoscaler deployed by Alibaba cloud, but it's quite problematic:

  • I can't use the same version of cluster-autoscaler as on other clouds
  • it seems like this internal fork is based on an old version of cluster-autoscaler (internal tag is v1.3.1-7369cf1) and e.g. scaling events aren't visible in pending Pods' descriptions (there are errors like E1229 11:03:27.374533 1 event.go:259] Could not construct reference to: '&v1.Pod(...)' due to: 'selfLink was empty, can't make reference'. Will not report event), I can also imagine that there are bug fixes in this repository that haven't been backported to the internal fork

Additional context.:

The internal fork is deployed by alicloud_cs_kubernetes_autoscaler terraform resource in Alicloud terraform provider maintained by Alibaba cloud If one looks at it's source code (available here) it becomes clear that internal fork has support for taints which haven't been merged to this repository.

I think that these constants (here) are self-descriptive:

const (
	(...)
	defaultAutoscalerImage     = "registry-vpc.%s.aliyuncs.com/acs/autoscaler:v1.3.1-7369cf1"
	LabelPattern               = "k8s.io/cluster-autoscaler/node-template/label/"
	TaintPattern               = "k8s.io/cluster-autoscaler/node-template/taint/"
)

I've configured cluster-autoscaler like this and I can confirm it's working as one would expect :)

teqwve avatar Dec 31 '21 10:12 teqwve

To be precise I'd like to add that there is also a 'managed' version of cluster-autoscaler (documented here) that deploys a different tag (last time I checked the v1.3.1 part was the same, hash was different) than the terraform provider does. I ended up using this tag with ASG tag patterns the same as in the terraform provider. I can't remember the exact reasons now (the tag from terraform was incompatible with new kubernetes api, or something like this).

teqwve avatar Dec 31 '21 10:12 teqwve

@teqwve Sure. I'll check it.

ringtail avatar Jan 03 '22 07:01 ringtail

CC @IrisIris

ringtail avatar Feb 11 '22 08:02 ringtail

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 12 '22 09:05 k8s-triage-robot

Hi, @ringtail, are there any chances of doing this?

If you don't have time, I will be very happy to provide a patch with taints support myself (but it would differ from internal fork in this case).

teqwve avatar May 27 '22 16:05 teqwve

/remove-lifecycle stale

teqwve avatar May 27 '22 16:05 teqwve

@IrisIris pls follow it up.

ringtail avatar May 31 '22 06:05 ringtail

Hi! I see you might not have sufficient resources, I'll try to provide a PR for changing this myself. Is that ok?

teqwve avatar Aug 04 '22 09:08 teqwve

Hi! I see you might not have sufficient resources, I'll try to provide a PR for changing this myself. Is that ok?

sure, thanks man.

ringtail avatar Aug 05 '22 02:08 ringtail

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 03 '22 03:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 03 '22 03:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jan 02 '23 04:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 02 '23 04:01 k8s-ci-robot