autoscaler
autoscaler copied to clipboard
Helm install on EKS ends in CrashLoopBackOff without clear error message
Which component are you using?: cluster-autoscaler installed with helm chart
What version of the component are you using?:
- cluster-autoscaler 1.23.0
- helm chart version 9.19.3 (also tried with 9.18.1)
│ Containers: │ │ aws-cluster-autoscaler: │ │ Container ID: docker://3c28b997f44070f5a01ff85a0f566f30fd7fcd23c4873bad3b5059be6be44faf │ │ Image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.23.0 │ │ Image ID: docker-pullable://k8s.gcr.io/autoscaling/cluster-autoscaler@sha256:f46687231c2c1bfa139f2b18275b123222c8cf6a288bb9c8145932bd14ac3deb
Component version:
What k8s version are you using (kubectl version)?:
kubectl version Output
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.7-eks-4721010", GitCommit:"b77d9473a02fbfa834afa67d677fd12d690b195f", GitTreeState:"clean", BuildDate:"2022-06-27T22:19:07Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}
What environment is this in?: AWS EKS
What did you expect to happen?: I expected cluster-autoscaler to start successfully and discover the autoscaling group for my cluster
What happened instead?: The pod is in a CrashLoopBackOff cycle always failing on startup.
The following exception happens on each start of the pod:
I0816 12:00:36.501922 1 reflector.go:255] Listing and watching *v1.ReplicaSet from k8s.io/client-go/informers/factory.go:134 │
│ I0816 12:00:36.456793 1 reflector.go:219] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134 │
│ I0816 12:00:36.502308 1 reflector.go:255] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:134 │
│ I0816 12:00:36.456936 1 reflector.go:219] Starting reflector *v1beta1.CSIStorageCapacity (0s) from k8s.io/client-go/informers/factory.go:134 │
│ I0816 12:00:36.502560 1 reflector.go:255] Listing and watching *v1beta1.CSIStorageCapacity from k8s.io/client-go/informers/factory.go:134 │
│ I0816 12:00:36.457096 1 reflector.go:219] Starting reflector *v1.StorageClass (0s) from k8s.io/client-go/informers/factory.go:134 │
│ I0816 12:00:36.502958 1 reflector.go:255] Listing and watching *v1.StorageClass from k8s.io/client-go/informers/factory.go:134 │
│ I0816 12:00:36.457257 1 reflector.go:219] Starting reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:134 │
│ I0816 12:00:36.503790 1 reflector.go:255] Listing and watching *v1.Service from k8s.io/client-go/informers/factory.go:134 │
│ W0816 12:00:36.513521 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget │
│ I0816 12:00:36.625402 1 request.go:597] Waited for 123.619964ms due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/api/v1/persistentvolumeclaims?limit=500&resourceVersion=0 │
│ I0816 12:00:36.825444 1 request.go:597] Waited for 322.959031ms due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/api/v1/nodes?limit=500&resourceVersion=0 │
│ I0816 12:00:37.025525 1 request.go:597] Waited for 522.241959ms due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/api/v1/pods?limit=500&resourceVersion=0 │
│ I0816 12:00:37.225218 1 request.go:597] Waited for 721.258ms due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/api/v1/services?limit=500&resourceVersion=0 │
│ goroutine 341 [select]: │
│ net/http.(*persistConn).writeLoop(0xc000b8c5a0) │
│ /usr/local/go/src/net/http/transport.go:2386 +0xfb │
│ created by net/http.(*Transport).dialConn │
│ /usr/local/go/src/net/http/transport.go:1748 +0x1e65 │
│ │
│ goroutine 371 [IO wait]: │
│ internal/poll.runtime_pollWait(0x7f7299fa50a8, 0x72) │
│ /usr/local/go/src/runtime/netpoll.go:234 +0x89 │
│ internal/poll.(*pollDesc).wait(0xc00012da00, 0xc00079f300, 0x0) │
│ /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32 │
│ internal/poll.(*pollDesc).waitRead(...) │
│ /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 │
│ internal/poll.(*FD).Read(0xc00012da00, {0xc00079f300, 0x191e, 0x191e}) │
│ /usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a │
│ net.(*netFD).Read(0xc00012da00, {0xc00079f300, 0xc00079f30d, 0xb4}) │
│ /usr/local/go/src/net/fd_posix.go:56 +0x29 │
│ net.(*conn).Read(0xc00084db40, {0xc00079f300, 0x1911, 0xc0009517f8}) │
│ /usr/local/go/src/net/net.go:183 +0x45 │
│ crypto/tls.(*atLeastReader).Read(0xc001dee540, {0xc00079f300, 0x0, 0x409b6d}) │
│ /usr/local/go/src/crypto/tls/conn.go:777 +0x3d │
│ bytes.(*Buffer).ReadFrom(0xc000db8278, {0x41a8100, 0xc001dee540}) │
│ /usr/local/go/src/bytes/buffer.go:204 +0x98 │
│ crypto/tls.(*Conn).readFromUntil(0xc000db8000, {0x41ad760, 0xc00084db40}, 0x191e) │
│ /usr/local/go/src/crypto/tls/conn.go:799 +0xe5 │
│ crypto/tls.(*Conn).readRecordOrCCS(0xc000db8000, 0x0)
| /usr/local/go/src/crypto/tls/conn.go:606 +0x112 │
│ crypto/tls.(*Conn).readRecord(...) │
│ /usr/local/go/src/crypto/tls/conn.go:574 │
│ crypto/tls.(*Conn).Read(0xc000db8000, {0xc0001ac000, 0x1000, 0x1}) │
│ /usr/local/go/src/crypto/tls/conn.go:1277 +0x16f │
│ net/http.(*persistConn).Read(0xc0015e6a20, {0xc0001ac000, 0xc000da1200, 0xc000951d30}) │
│ /usr/local/go/src/net/http/transport.go:1926 +0x4e │
│ bufio.(*Reader).fill(0xc000dbad80) │
│ /usr/local/go/src/bufio/bufio.go:101 +0x103 │
│ bufio.(*Reader).Peek(0xc000dbad80, 0x1) │
│ /usr/local/go/src/bufio/bufio.go:139 +0x5d │
│ net/http.(*persistConn).readLoop(0xc0015e6a20) │
│ /usr/local/go/src/net/http/transport.go:2087 +0x1ac │
│ created by net/http.(*Transport).dialConn │
│ /usr/local/go/src/net/http/transport.go:1747 +0x1e05 │
│ │
│ goroutine 335 [sync.Cond.Wait]: │
│ sync.runtime_notifyListWait(0xc000870300, 0x0) │
│ /usr/local/go/src/runtime/sema.go:513 +0x13d │
│ sync.(*Cond).Wait(0xc0007da4e0) │
│ /usr/local/go/src/sync/cond.go:56 +0x8c │
│ golang.org/x/net/http2.(*pipe).Read(0xc0008702e8, {0xc00037c800, 0x200, 0x200}) │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/pipe.go:65 +0xeb │
│ golang.org/x/net/http2.transportResponseBody.Read({0x40cdde}, {0xc00037c800, 0xd0, 0xc0000ed4b0}) │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/transport.go:2104 +0x77 │
│ encoding/json.(*Decoder).refill(0xc0002a4640) │
│ /usr/local/go/src/encoding/json/stream.go:165 +0x17f │
│ encoding/json.(*Decoder).readValue(0xc0002a4640) │
│ /usr/local/go/src/encoding/json/stream.go:140 +0xbb │
│ encoding/json.(*Decoder).Decode(0xc0002a4640, {0x350a880, 0xc000b5ce70}) │
│ /usr/local/go/src/encoding/json/stream.go:63 +0x78 │
│ k8s.io/apimachinery/pkg/util/framer.(*jsonFrameReader).Read(0xc0007f16e0, {0xc000686c00, 0x400, 0x400}) │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/framer/framer.go:152 +0x19c │
│ k8s.io/apimachinery/pkg/runtime/serializer/streaming.(*decoder).Decode(0xc000dbc8c0, 0x0, {0x420a720, 0xc0008614c0}) │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/runtime/serializer/streaming/streaming.go:77 +0xa7 │
│ k8s.io/client-go/rest/watch.(*Decoder).Decode(0xc00087f0e0) │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/rest/watch/decoder.go:49 +0x4f
│ k8s.io/apimachinery/pkg/watch.(*StreamWatcher).receive(0xc000861480) │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:105 +0x11c │
│ created by k8s.io/apimachinery/pkg/watch.NewStreamWatcher │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:76 +0x135 │
│ │
│ goroutine 355 [sync.Cond.Wait]: │
│ sync.runtime_notifyListWait(0xc0015c3e80, 0x0) │
│ /usr/local/go/src/runtime/sema.go:513 +0x13d │
│ sync.(*Cond).Wait(0x10) │
│ /usr/local/go/src/sync/cond.go:56 +0x8c │
│ golang.org/x/net/http2.(*pipe).Read(0xc0015c3e68, {0xc00086e800, 0x200, 0x200}) │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/pipe.go:65 +0xeb │
│ golang.org/x/net/http2.transportResponseBody.Read({0x1}, {0xc00086e800, 0x0, 0xc0008decb0}) │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/golang.org/x/net/http2/transport.go:2104 +0x77 │
│ encoding/json.(*Decoder).refill(0xc001591180) │
│ /usr/local/go/src/encoding/json/stream.go:165 +0x17f │
│ encoding/json.(*Decoder).readValue(0xc001591180) │
│ /usr/local/go/src/encoding/json/stream.go:140 +0xbb │
│ encoding/json.(*Decoder).Decode(0xc001591180, {0x350a880, 0xc0017defc0}) │
│ /usr/local/go/src/encoding/json/stream.go:63 +0x78 │
│ k8s.io/apimachinery/pkg/util/framer.(*jsonFrameReader).Read(0xc0017e8060, {0xc00056a800, 0x400, 0x400}) │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/framer/framer.go:152 +0x19c │
│ k8s.io/apimachinery/pkg/runtime/serializer/streaming.(*decoder).Decode(0xc0015b4500, 0x3, {0x420a720, 0xc0017e6c80}) │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/runtime/serializer/streaming/streaming.go:77 +0xa7 │
│ k8s.io/client-go/rest/watch.(*Decoder).Decode(0xc0012ed4a0) │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/rest/watch/decoder.go:49 +0x4f │
│ k8s.io/apimachinery/pkg/watch.(*StreamWatcher).receive(0xc0017e6c40) │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:105 +0x11c │
│ created by k8s.io/apimachinery/pkg/watch.NewStreamWatcher │
│ /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:76 +0x135 │
│ │
│ goroutine 372 [select]: │
│ net/http.(*persistConn).writeLoop(0xc0015e6a20) │
│ /usr/local/go/src/net/http/transport.go:2386 +0xfb │
│ created by net/http.(*Transport).dialConn │
│ /usr/local/go/src/net/http/transport.go:1748 +0x1e65 │
│ Stream closed EOF for kube-system/cluster-autoscaler-aws-cluster-autoscaler-d55c89b9f-s9dtd (aws-cluster-autoscaler)
How to reproduce it (as minimally and precisely as possible):
- setup an eks cluster with the latest kubernetes version
- create the policy, role and trust relationship like described in the documentation https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md
- install the cluster autoscaler using helm:
helm install cluster-autoscaler autoscaler/cluster-autoscaler \ -n kube-system \ --version v9.19.3 \ --set autoDiscovery.clusterName=my-cluster \ --set awsRegion=eu-central-1 \ --set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::xxxxxxx:role/ClusterAutoscalerIAMRole \ --set rbac.serviceAccount.name=aws-cluster-autoscaler
- wait for the pod to start
Anything else we need to know?: I'm using terraform for the whole setup of the roles etc. The created roles/policies look as follows:
{
"Statement": [
{
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeScalingActivities",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplateVersions"
],
"Effect": "Allow",
"Resource": [
"*"
]
},
{
"Action": [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeImages",
"ec2:DescribeInstanceTypes",
"ec2:GetInstanceTypesFromInstanceRequirements",
"eks:DescribeNodegroup"
],
"Effect": "Allow",
"Resource": [
"*"
]
}
],
"Version": "2012-10-17"
}
ClusterAutoscalerIAMRole trust relationship (with the policy ClusterAutoscalerIAMPolicy attached to it):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::xxxxxxxxx:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/xxxxxxxxx"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.eu-central-1.amazonaws.com/id/xxxxxxxxx:sub": "system:serviceaccount:kube-system:aws-cluster-autoscaler"
}
}
}
]
}
The service account created by the helm install has the correct role annotated to it:
Name: aws-cluster-autoscaler
Namespace: kube-system
Labels: app.kubernetes.io/instance=cluster-autoscaler
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=aws-cluster-autoscaler
helm.sh/chart=cluster-autoscaler-9.19.3
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::xxxxxxx:role/ClusterAutoscalerIAMRole
meta.helm.sh/release-name: cluster-autoscaler
meta.helm.sh/release-namespace: kube-system
Image pull secrets:
Mountable secrets: aws-cluster-autoscaler-token-bhnhv
Tokens: aws-cluster-autoscaler-token-bhnhv
Events:
If i miss some required configuration or you need any more details, please let me know.