autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

Cluster Autoscaler configured in EKS cluster 1 terminates ASGs from EKS cluster 2

Open nirvanagit opened this issue 7 months ago • 3 comments

Which component are you using?: Cluster Autoscaler

What version of the component are you using?: 1.31

Component version:

What k8s version are you using (kubectl version)?: 1.31

kubectl version Output
$ kubectl version

What environment is this in?: AWS EKS

What did you expect to happen?: Cluster Autoscaler in one cluster should only manage ASGs associated to the kubernetes cluster it is using to discover the load.

What happened instead?: A cluster autoscaler was deployed to cluster A, but was able to discover ASGs in other clusters (under the same AWS account), and started deleting ASGs from other clusters.

How to reproduce it (as minimally and precisely as possible):

  1. Create IAM role for cluster auto scaler which has access to cluster 1 and cluster 2
  2. Increase load on cluster 2, such that new kubernetes nodes have to spin up
  3. Keep minimum load on cluster 1
  4. Deploy cluster auto scaler in cluster 1, with ASG labels which match both clusters

Anything else we need to know?:

nirvanagit avatar May 02 '25 05:05 nirvanagit

I faced with the same issue, but I forgot to create a ticket on it. Totally worth to implement!

@nirvanagit My temporar fix is updating the aws_cloudwatch_event_rule resource with extra event_pattern:

resource "aws_cloudwatch_event_rule" "aws_node_termination_handler" {
  for_each = { for k, v in local.aws_node_termination_handler_events : k => v if var.enable_aws_node_termination_handler }

  name_prefix = "NTH-${each.value.name}-"
  description = each.value.description
  event_pattern = jsonencode(merge(each.value.event_pattern, {
    detail = {
      AutoScalingGroupName = [for i in var.aws_node_termination_handler_asg_arns : replace(i, "/^.*:autoScalingGroupName//", "")]
    }
  }))

  tags = merge(
    { "ClusterName" : var.cluster_name },
    var.tags,
  )
}

But I had to clone the module

VolodymyrSmahliuk avatar May 02 '25 09:05 VolodymyrSmahliuk

/area cluster-autoscaler

adrianmoisey avatar May 02 '25 11:05 adrianmoisey

Can one of the maintainers please comment. Looking for guidance, and possible availability of such a feature.

nirvanagit avatar May 28 '25 03:05 nirvanagit

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 26 '25 03:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Sep 25 '25 04:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Oct 25 '25 04:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Oct 25 '25 04:10 k8s-ci-robot