terraform-aws-eks-cluster-autoscaler bug: deprecated API errors in autoscaler logs when module is used with AWS EKS v1.28

Summary

API deprecation errors are shown in the autoscaler logs due to usage of the old beta APIs.

They include v1beta1.PodDisruptionBudget and v1beta1.CSIStorageCapacity.

Issue Type

Bug Report

Terraform Version

$ terraform --version
Terraform v1.6.3
on linux_amd64
+ provider registry.terraform.io/cloudposse/utils v1.14.0
+ provider registry.terraform.io/hashicorp/aws v4.47.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.2
+ provider registry.terraform.io/hashicorp/external v2.3.1
+ provider registry.terraform.io/hashicorp/helm v2.11.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.23.0
+ provider registry.terraform.io/hashicorp/local v2.4.0
+ provider registry.terraform.io/hashicorp/null v3.2.1
+ provider registry.terraform.io/hashicorp/random v3.5.1
+ provider registry.terraform.io/hashicorp/tls v4.0.4

Steps to Reproduce

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "19.5.1"

  cluster_name    = var.eks_cluster_name
  cluster_version = "1.28"
...
...
}

module "eks-cluster-autoscaler" {
  source  = "lablabs/eks-cluster-autoscaler/aws"
  version = "2.1.1"

  enabled                          = true
  namespace                        = "kube-system"
  helm_description                 = "TF AWS Autoscaler Module Helm (https://registry.terraform.io/modules/lablabs/eks-cluster-autoscaler/aws/latest)"
  cluster_identity_oidc_issuer     = module.eks.cluster_oidc_issuer_url
  cluster_identity_oidc_issuer_arn = module.eks.oidc_provider_arn
  cluster_name                     = var.eks_cluster_name
}

To test that the autoscaler works, I launched a 300 nginx pod deployment using the following yaml:

cat > nginx-example-autoscale.yml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 300
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
EOF

# DEPLOY 
kubectl apply -f nginx-example-autoscale.yml

# CHECK IF SCALING UP WORKS; IT DOES
watch -n1 kubectl top node

# REMOVE DEPLOY AND WAIT A COUPLE MINUTES
kubectl delete -f nginx-example-autoscale.yml

# CHECK IF SCALING DOWN WORKS; IT DOES
watch -n1 kubectl top node

On an EKS cluster with Kubernetes version 1.28, if you pipe the logs of the autoscaler pod, you will notice the errors, which I have listed in the Actual Results section of this report.

There is an open PR on Kubernetes' repo, with a workaround.

Expected Results

According to Kubernetes, the policy/v1beta1 API was deprecated since 1.25.

Instead, the policy/v1 API should be used, which involves an if-else block in the Helm template.

There should be no API deprecation error messages.

Actual Results

# API deprecation errors are shown for PodDisruptionBudget and CSIStorageCapacity:

1 reflector.go:138] k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309: Failed to watch *v1beta1.PodDisruptionBudget: failed to list *v1beta1.PodDisruptionBudget: the server could not find the requested resource

1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource

# More verbose logs below

I1115 15:28:21.807496       1 static_autoscaler.go:230] Starting main loop
I1115 15:28:21.808073       1 filter_out_schedulable.go:65] Filtering out schedulables
I1115 15:28:21.808088       1 filter_out_schedulable.go:132] Filtered out 0 pods using hints
I1115 15:28:21.808093       1 filter_out_schedulable.go:170] 0 pods were kept as unschedulable based on caching
I1115 15:28:21.808096       1 filter_out_schedulable.go:171] 0 pods marked as unschedulable can be scheduled.
I1115 15:28:21.808101       1 filter_out_schedulable.go:82] No schedulable pods
I1115 15:28:21.808110       1 static_autoscaler.go:419] No unschedulable pods
I1115 15:28:21.808122       1 static_autoscaler.go:466] Calculating unneeded nodes
I1115 15:28:21.808133       1 pre_filtering_processor.go:66] Skipping ip-10-10-3-226.ec2.internal - node group min size reached
I1115 15:28:21.808148       1 scale_down.go:509] Scale-down calculation: ignoring 2 nodes unremovable in the last 5m0s
I1115 15:28:21.808176       1 static_autoscaler.go:520] Scale down status: unneededOnly=false lastScaleUpTime=2023-11-14 19:43:36.461349334 +0000 UTC m=+404.222749662 lastScaleDownDeleteTime=2023-11-14 19:50:18.631108429 +0000 UTC m=+806.392508757 lastScaleDownFailTime=2023-11-14 18:37:14.739858175 +0000 UTC m=-3577.498741491 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false
I1115 15:28:21.808204       1 static_autoscaler.go:533] Starting scale down
I1115 15:28:21.808238       1 scale_down.go:918] No candidates for scale down
I1115 15:28:26.825479       1 reflector.go:255] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309
W1115 15:28:26.843868       1 reflector.go:324] k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309: failed to list *v1beta1.PodDisruptionBudget: the server could not find the requested resource
E1115 15:28:26.843889       1 reflector.go:138] k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309: Failed to watch *v1beta1.PodDisruptionBudget: failed to list *v1beta1.PodDisruptionBudget: the server could not find the requested resource
I1115 15:28:31.822565       1 static_autoscaler.go:230] Starting main loop

Nov 15 '23 16:11 dudeitssm

Hello @dudeitssm, did you try using underlying Helm chart version compatible with EKS 1.28, you can supply this using variable helm_chart_version and setting it to 9.34.1. Please, let me know if this works for you.

Mar 05 '24 12:03 jaygridley

Hello @dudeitssm, did you manage to get it working?

May 27 '24 11:05 jaygridley

@jaygridley

Sorry, I did not get the chance to try this yet.

I had forgotten about opening this report a while ago. I'll try to test it out at work this week.

May 27 '24 13:05 dudeitssm

Excellent @jaygridley. That worked! Thank you for the solution :hugs:

Closing issue.

May 28 '24 16:05 dudeitssm

terraform-aws-eks-cluster-autoscaler terraform-aws-eks-cluster-autoscaler copied to clipboard

bug: deprecated API errors in autoscaler logs when module is used with AWS EKS v1.28

Summary

Issue Type

Terraform Version

Steps to Reproduce

Expected Results

Actual Results

terraform-aws-eks-cluster-autoscaler
terraform-aws-eks-cluster-autoscaler copied to clipboard