terraform-aws-eks-cluster-autoscaler
terraform-aws-eks-cluster-autoscaler copied to clipboard
bug: deprecated API errors in autoscaler logs when module is used with AWS EKS v1.28
Summary
API deprecation errors are shown in the autoscaler logs due to usage of the old beta APIs.
They include v1beta1.PodDisruptionBudget
and v1beta1.CSIStorageCapacity
.
Issue Type
Bug Report
Terraform Version
$ terraform --version
Terraform v1.6.3
on linux_amd64
+ provider registry.terraform.io/cloudposse/utils v1.14.0
+ provider registry.terraform.io/hashicorp/aws v4.47.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.2
+ provider registry.terraform.io/hashicorp/external v2.3.1
+ provider registry.terraform.io/hashicorp/helm v2.11.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.23.0
+ provider registry.terraform.io/hashicorp/local v2.4.0
+ provider registry.terraform.io/hashicorp/null v3.2.1
+ provider registry.terraform.io/hashicorp/random v3.5.1
+ provider registry.terraform.io/hashicorp/tls v4.0.4
Steps to Reproduce
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "19.5.1"
cluster_name = var.eks_cluster_name
cluster_version = "1.28"
...
...
}
module "eks-cluster-autoscaler" {
source = "lablabs/eks-cluster-autoscaler/aws"
version = "2.1.1"
enabled = true
namespace = "kube-system"
helm_description = "TF AWS Autoscaler Module Helm (https://registry.terraform.io/modules/lablabs/eks-cluster-autoscaler/aws/latest)"
cluster_identity_oidc_issuer = module.eks.cluster_oidc_issuer_url
cluster_identity_oidc_issuer_arn = module.eks.oidc_provider_arn
cluster_name = var.eks_cluster_name
}
To test that the autoscaler works, I launched a 300 nginx pod deployment using the following yaml:
cat > nginx-example-autoscale.yml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 300
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
EOF
# DEPLOY
kubectl apply -f nginx-example-autoscale.yml
# CHECK IF SCALING UP WORKS; IT DOES
watch -n1 kubectl top node
# REMOVE DEPLOY AND WAIT A COUPLE MINUTES
kubectl delete -f nginx-example-autoscale.yml
# CHECK IF SCALING DOWN WORKS; IT DOES
watch -n1 kubectl top node
On an EKS cluster with Kubernetes version 1.28, if you pipe the logs of the autoscaler pod, you will notice the errors, which I have listed in the Actual Results
section of this report.
There is an open PR on Kubernetes' repo, with a workaround.
Expected Results
According to Kubernetes, the policy/v1beta1 API was deprecated since 1.25.
Instead, the policy/v1 API should be used, which involves an if-else block in the Helm template.
There should be no API deprecation error messages.
Actual Results
# API deprecation errors are shown for PodDisruptionBudget and CSIStorageCapacity:
1 reflector.go:138] k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309: Failed to watch *v1beta1.PodDisruptionBudget: failed to list *v1beta1.PodDisruptionBudget: the server could not find the requested resource
1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
# More verbose logs below
I1115 15:28:21.807496 1 static_autoscaler.go:230] Starting main loop
I1115 15:28:21.808073 1 filter_out_schedulable.go:65] Filtering out schedulables
I1115 15:28:21.808088 1 filter_out_schedulable.go:132] Filtered out 0 pods using hints
I1115 15:28:21.808093 1 filter_out_schedulable.go:170] 0 pods were kept as unschedulable based on caching
I1115 15:28:21.808096 1 filter_out_schedulable.go:171] 0 pods marked as unschedulable can be scheduled.
I1115 15:28:21.808101 1 filter_out_schedulable.go:82] No schedulable pods
I1115 15:28:21.808110 1 static_autoscaler.go:419] No unschedulable pods
I1115 15:28:21.808122 1 static_autoscaler.go:466] Calculating unneeded nodes
I1115 15:28:21.808133 1 pre_filtering_processor.go:66] Skipping ip-10-10-3-226.ec2.internal - node group min size reached
I1115 15:28:21.808148 1 scale_down.go:509] Scale-down calculation: ignoring 2 nodes unremovable in the last 5m0s
I1115 15:28:21.808176 1 static_autoscaler.go:520] Scale down status: unneededOnly=false lastScaleUpTime=2023-11-14 19:43:36.461349334 +0000 UTC m=+404.222749662 lastScaleDownDeleteTime=2023-11-14 19:50:18.631108429 +0000 UTC m=+806.392508757 lastScaleDownFailTime=2023-11-14 18:37:14.739858175 +0000 UTC m=-3577.498741491 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false
I1115 15:28:21.808204 1 static_autoscaler.go:533] Starting scale down
I1115 15:28:21.808238 1 scale_down.go:918] No candidates for scale down
I1115 15:28:26.825479 1 reflector.go:255] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309
W1115 15:28:26.843868 1 reflector.go:324] k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309: failed to list *v1beta1.PodDisruptionBudget: the server could not find the requested resource
E1115 15:28:26.843889 1 reflector.go:138] k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309: Failed to watch *v1beta1.PodDisruptionBudget: failed to list *v1beta1.PodDisruptionBudget: the server could not find the requested resource
I1115 15:28:31.822565 1 static_autoscaler.go:230] Starting main loop
Hello @dudeitssm, did you try using underlying Helm chart version compatible with EKS 1.28, you can supply this using variable helm_chart_version
and setting it to 9.34.1
. Please, let me know if this works for you.
Hello @dudeitssm, did you manage to get it working?
@jaygridley
Sorry, I did not get the chance to try this yet.
I had forgotten about opening this report a while ago. I'll try to test it out at work this week.
Excellent @jaygridley. That worked! Thank you for the solution :hugs:
Closing issue.