terraform-aws-eks-cluster-autoscaler icon indicating copy to clipboard operation
terraform-aws-eks-cluster-autoscaler copied to clipboard

bug: deprecated API errors in autoscaler logs when module is used with AWS EKS v1.28

Open dudeitssm opened this issue 1 year ago • 1 comments

Summary

API deprecation errors are shown in the autoscaler logs due to usage of the old beta APIs.

They include v1beta1.PodDisruptionBudget and v1beta1.CSIStorageCapacity.

Issue Type

Bug Report

Terraform Version

$ terraform --version
Terraform v1.6.3
on linux_amd64
+ provider registry.terraform.io/cloudposse/utils v1.14.0
+ provider registry.terraform.io/hashicorp/aws v4.47.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.2
+ provider registry.terraform.io/hashicorp/external v2.3.1
+ provider registry.terraform.io/hashicorp/helm v2.11.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.23.0
+ provider registry.terraform.io/hashicorp/local v2.4.0
+ provider registry.terraform.io/hashicorp/null v3.2.1
+ provider registry.terraform.io/hashicorp/random v3.5.1
+ provider registry.terraform.io/hashicorp/tls v4.0.4

Steps to Reproduce

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "19.5.1"

  cluster_name    = var.eks_cluster_name
  cluster_version = "1.28"
...
...
}

module "eks-cluster-autoscaler" {
  source  = "lablabs/eks-cluster-autoscaler/aws"
  version = "2.1.1"

  enabled                          = true
  namespace                        = "kube-system"
  helm_description                 = "TF AWS Autoscaler Module Helm (https://registry.terraform.io/modules/lablabs/eks-cluster-autoscaler/aws/latest)"
  cluster_identity_oidc_issuer     = module.eks.cluster_oidc_issuer_url
  cluster_identity_oidc_issuer_arn = module.eks.oidc_provider_arn
  cluster_name                     = var.eks_cluster_name
}

To test that the autoscaler works, I launched a 300 nginx pod deployment using the following yaml:

cat > nginx-example-autoscale.yml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 300
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
EOF

# DEPLOY 
kubectl apply -f nginx-example-autoscale.yml

# CHECK IF SCALING UP WORKS; IT DOES
watch -n1 kubectl top node

# REMOVE DEPLOY AND WAIT A COUPLE MINUTES
kubectl delete -f nginx-example-autoscale.yml

# CHECK IF SCALING DOWN WORKS; IT DOES
watch -n1 kubectl top node

On an EKS cluster with Kubernetes version 1.28, if you pipe the logs of the autoscaler pod, you will notice the errors, which I have listed in the Actual Results section of this report.

There is an open PR on Kubernetes' repo, with a workaround.

Expected Results

According to Kubernetes, the policy/v1beta1 API was deprecated since 1.25.

Instead, the policy/v1 API should be used, which involves an if-else block in the Helm template.

There should be no API deprecation error messages.

Actual Results

# API deprecation errors are shown for PodDisruptionBudget and CSIStorageCapacity:

1 reflector.go:138] k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309: Failed to watch *v1beta1.PodDisruptionBudget: failed to list *v1beta1.PodDisruptionBudget: the server could not find the requested resource

1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource

# More verbose logs below

I1115 15:28:21.807496       1 static_autoscaler.go:230] Starting main loop
I1115 15:28:21.808073       1 filter_out_schedulable.go:65] Filtering out schedulables
I1115 15:28:21.808088       1 filter_out_schedulable.go:132] Filtered out 0 pods using hints
I1115 15:28:21.808093       1 filter_out_schedulable.go:170] 0 pods were kept as unschedulable based on caching
I1115 15:28:21.808096       1 filter_out_schedulable.go:171] 0 pods marked as unschedulable can be scheduled.
I1115 15:28:21.808101       1 filter_out_schedulable.go:82] No schedulable pods
I1115 15:28:21.808110       1 static_autoscaler.go:419] No unschedulable pods
I1115 15:28:21.808122       1 static_autoscaler.go:466] Calculating unneeded nodes
I1115 15:28:21.808133       1 pre_filtering_processor.go:66] Skipping ip-10-10-3-226.ec2.internal - node group min size reached
I1115 15:28:21.808148       1 scale_down.go:509] Scale-down calculation: ignoring 2 nodes unremovable in the last 5m0s
I1115 15:28:21.808176       1 static_autoscaler.go:520] Scale down status: unneededOnly=false lastScaleUpTime=2023-11-14 19:43:36.461349334 +0000 UTC m=+404.222749662 lastScaleDownDeleteTime=2023-11-14 19:50:18.631108429 +0000 UTC m=+806.392508757 lastScaleDownFailTime=2023-11-14 18:37:14.739858175 +0000 UTC m=-3577.498741491 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false
I1115 15:28:21.808204       1 static_autoscaler.go:533] Starting scale down
I1115 15:28:21.808238       1 scale_down.go:918] No candidates for scale down
I1115 15:28:26.825479       1 reflector.go:255] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309
W1115 15:28:26.843868       1 reflector.go:324] k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309: failed to list *v1beta1.PodDisruptionBudget: the server could not find the requested resource
E1115 15:28:26.843889       1 reflector.go:138] k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309: Failed to watch *v1beta1.PodDisruptionBudget: failed to list *v1beta1.PodDisruptionBudget: the server could not find the requested resource
I1115 15:28:31.822565       1 static_autoscaler.go:230] Starting main loop

dudeitssm avatar Nov 15 '23 16:11 dudeitssm

Hello @dudeitssm, did you try using underlying Helm chart version compatible with EKS 1.28, you can supply this using variable helm_chart_version and setting it to 9.34.1. Please, let me know if this works for you.

jaygridley avatar Mar 05 '24 12:03 jaygridley

Hello @dudeitssm, did you manage to get it working?

jaygridley avatar May 27 '24 11:05 jaygridley

@jaygridley

Sorry, I did not get the chance to try this yet.

I had forgotten about opening this report a while ago. I'll try to test it out at work this week.

dudeitssm avatar May 27 '24 13:05 dudeitssm

Excellent @jaygridley. That worked! Thank you for the solution :hugs:

Closing issue.

dudeitssm avatar May 28 '24 16:05 dudeitssm