arlon icon indicating copy to clipboard operation
arlon copied to clipboard

CAPI cluster autoscaler (for MachineDeployment) possibly broken on k8s 1.21+

Open bcle opened this issue 2 years ago • 2 comments

Reported by @ShaunakJoshi1407 On test environment, the CallHomeConfig custom resource is stuck in the retrying state. See error message in Status field:

$ kubectl -n capi-old-argo get callhomeconfig cluster-autoscaler -oyaml
apiVersion: core.arlon.io/v1
kind: CallHomeConfig
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"core.arlon.io/v1","kind":"CallHomeConfig","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"capi-old-argo"},"name":"cluster-autoscaler","namespace":"capi-old-argo"},"spec":{"kubeconfigSecretKeyName":"value","kubeconfigSecretName":"capi-old-argo-kubeconfig","managementClusterUrl":"https://127.0.0.1:42943","serviceAccountName":"cluster-autoscaler","targetNamespace":"kube-system","targetSecretKeyName":"kubeconfig","targetSecretName":"cluster-autoscaler-management-kubeconfig"}}
  creationTimestamp: "2022-09-12T17:12:10Z"
  generation: 1
  labels:
    app.kubernetes.io/instance: capi-old-argo
  name: cluster-autoscaler
  namespace: capi-old-argo
  resourceVersion: "15918"
  uid: fc0b3e12-f6a7-47f3-bb61-e50c621aa3bd
spec:
  kubeconfigSecretKeyName: value
  kubeconfigSecretName: capi-old-argo-kubeconfig
  managementClusterUrl: https://127.0.0.1:42943
  serviceAccountName: cluster-autoscaler
  targetNamespace: kube-system
  targetSecretKeyName: kubeconfig
  targetSecretName: cluster-autoscaler-management-kubeconfig
status:
  message: serviceaccount cluster-autoscaler does not have a token, retrying in 10
    seconds
  state: retrying

Inspecting the serviceaccount resource shows that it indeed has no Secrets:

$ kubectl -n capi-old-argo get sa cluster-autoscaler -oyaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"capi-old-argo"},"name":"cluster-autoscaler","namespace":"capi-old-argo"}}
  creationTimestamp: "2022-09-12T17:12:10Z"
  labels:
    app.kubernetes.io/instance: capi-old-argo
  name: cluster-autoscaler
  namespace: capi-old-argo
  resourceVersion: "12943"
  uid: 2a5e72c2-363e-4397-babf-74be87a342fd

Looking at the latest Kubernetes documentation, it appears that the association between serviceaccounts and token secrets could have changed in v1.21+: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-bound-service-account-tokens

Aha! Link: https://pf9.aha.io/features/ARLON-294

bcle avatar Sep 13 '22 02:09 bcle

@bcle are you already looking into this or should I have someone else take this up?

cruizen avatar Sep 14 '22 06:09 cruizen

I have not investigated beyond a cursory read of the above documents. I think it's a potentially tricky problem to solve. If someone in your team would like to own it, go for it. Otherwise I'll gladly take it.

bcle avatar Sep 14 '22 16:09 bcle