Cluster Autoscaler configured in EKS cluster 1 terminates ASGs from EKS cluster 2
Which component are you using?: Cluster Autoscaler
What version of the component are you using?: 1.31
Component version:
What k8s version are you using (kubectl version)?:
1.31
kubectl version Output
$ kubectl version
What environment is this in?: AWS EKS
What did you expect to happen?: Cluster Autoscaler in one cluster should only manage ASGs associated to the kubernetes cluster it is using to discover the load.
What happened instead?: A cluster autoscaler was deployed to cluster A, but was able to discover ASGs in other clusters (under the same AWS account), and started deleting ASGs from other clusters.
How to reproduce it (as minimally and precisely as possible):
- Create IAM role for cluster auto scaler which has access to cluster 1 and cluster 2
- Increase load on cluster 2, such that new kubernetes nodes have to spin up
- Keep minimum load on cluster 1
- Deploy cluster auto scaler in cluster 1, with ASG labels which match both clusters
Anything else we need to know?:
I faced with the same issue, but I forgot to create a ticket on it. Totally worth to implement!
@nirvanagit My temporar fix is updating the aws_cloudwatch_event_rule resource with extra event_pattern:
resource "aws_cloudwatch_event_rule" "aws_node_termination_handler" {
for_each = { for k, v in local.aws_node_termination_handler_events : k => v if var.enable_aws_node_termination_handler }
name_prefix = "NTH-${each.value.name}-"
description = each.value.description
event_pattern = jsonencode(merge(each.value.event_pattern, {
detail = {
AutoScalingGroupName = [for i in var.aws_node_termination_handler_asg_arns : replace(i, "/^.*:autoScalingGroupName//", "")]
}
}))
tags = merge(
{ "ClusterName" : var.cluster_name },
var.tags,
)
}
But I had to clone the module
/area cluster-autoscaler
Can one of the maintainers please comment. Looking for guidance, and possible availability of such a feature.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.