arlon
arlon copied to clipboard
CAPI cluster autoscaler (for MachineDeployment) possibly broken on k8s 1.21+
Reported by @ShaunakJoshi1407
On test environment, the CallHomeConfig custom resource is stuck in the retrying
state. See error message in Status field:
$ kubectl -n capi-old-argo get callhomeconfig cluster-autoscaler -oyaml
apiVersion: core.arlon.io/v1
kind: CallHomeConfig
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"core.arlon.io/v1","kind":"CallHomeConfig","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"capi-old-argo"},"name":"cluster-autoscaler","namespace":"capi-old-argo"},"spec":{"kubeconfigSecretKeyName":"value","kubeconfigSecretName":"capi-old-argo-kubeconfig","managementClusterUrl":"https://127.0.0.1:42943","serviceAccountName":"cluster-autoscaler","targetNamespace":"kube-system","targetSecretKeyName":"kubeconfig","targetSecretName":"cluster-autoscaler-management-kubeconfig"}}
creationTimestamp: "2022-09-12T17:12:10Z"
generation: 1
labels:
app.kubernetes.io/instance: capi-old-argo
name: cluster-autoscaler
namespace: capi-old-argo
resourceVersion: "15918"
uid: fc0b3e12-f6a7-47f3-bb61-e50c621aa3bd
spec:
kubeconfigSecretKeyName: value
kubeconfigSecretName: capi-old-argo-kubeconfig
managementClusterUrl: https://127.0.0.1:42943
serviceAccountName: cluster-autoscaler
targetNamespace: kube-system
targetSecretKeyName: kubeconfig
targetSecretName: cluster-autoscaler-management-kubeconfig
status:
message: serviceaccount cluster-autoscaler does not have a token, retrying in 10
seconds
state: retrying
Inspecting the serviceaccount resource shows that it indeed has no Secrets:
$ kubectl -n capi-old-argo get sa cluster-autoscaler -oyaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"capi-old-argo"},"name":"cluster-autoscaler","namespace":"capi-old-argo"}}
creationTimestamp: "2022-09-12T17:12:10Z"
labels:
app.kubernetes.io/instance: capi-old-argo
name: cluster-autoscaler
namespace: capi-old-argo
resourceVersion: "12943"
uid: 2a5e72c2-363e-4397-babf-74be87a342fd
Looking at the latest Kubernetes documentation, it appears that the association between serviceaccounts and token secrets could have changed in v1.21+: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-bound-service-account-tokens
Aha! Link: https://pf9.aha.io/features/ARLON-294
@bcle are you already looking into this or should I have someone else take this up?
I have not investigated beyond a cursory read of the above documents. I think it's a potentially tricky problem to solve. If someone in your team would like to own it, go for it. Otherwise I'll gladly take it.