aerospike-kubernetes-operator icon indicating copy to clipboard operation
aerospike-kubernetes-operator copied to clipboard

Karpenter scaling with k8sNodeBlockList throw errors

Open mateusmuller opened this issue 7 months ago • 5 comments

Folks,

I updated a static AerospikeCluster manifest with a bunch of EKS nodes on k8sNodeBlockList. This triggered an update as expected:

NAME                     READY   STATUS    RESTARTS   AGE
aerospike-1-0   2/2     Running   0          7h17m
aerospike-1-1   2/2     Running   0          7h16m
aerospike-1-2   0/2     Pending   0          82s
aerospike-2-0   2/2     Running   0          4h17m
aerospike-2-1   2/2     Running   0          4h17m
aerospike-2-2   2/2     Running   0          4h17m
aerospike-3-0   2/2     Running   0          27h
aerospike-3-1   2/2     Running   0          28h
aerospike-3-2   2/2     Running   0          27h

Although pod aerospike-1-2 keeps there forever. This is the error message from Karpenter:

2024-07-24T20:46:27.479Z	DEBUG	controller.provisioner	ignoring pod, label kubernetes.io/hostname is restricted; specify a well known label: [karpenter.k8s.aws/instance-category karpenter.k8s.aws/instance-cpu karpenter.k8s.aws/instance-encryption-in-transit-supported karpenter.k8s.aws/instance-family karpenter.k8s.aws/instance-generation karpenter.k8s.aws/instance-gpu-count karpenter.k8s.aws/instance-gpu-manufacturer karpenter.k8s.aws/instance-gpu-memory karpenter.k8s.aws/instance-gpu-name karpenter.k8s.aws/instance-hypervisor karpenter.k8s.aws/instance-local-nvme karpenter.k8s.aws/instance-memory karpenter.k8s.aws/instance-network-bandwidth karpenter.k8s.aws/instance-pods karpenter.k8s.aws/instance-size karpenter.sh/capacity-type karpenter.sh/provisioner-name kubernetes.io/arch kubernetes.io/os node.kubernetes.io/instance-type topology.kubernetes.io/region topology.kubernetes.io/zone], or a custom label that does not use a restricted domain: [k8s.io karpenter.k8s.aws karpenter.sh kubernetes.io]	{"commit": "dc3af1a", "pod": "datastore-shared/aerospike-1-2"}

Basically they don't allow kubernetes.io/hostname with NodeAffinity. This is what happens with that flag:

  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - us-east-1a
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - <list of nodes>

I found this issue under Karpenter repo with the same issue where they say the usage is wrong.

Can you please share your thoughts if this can be improved somehow? Thanks.

mateusmuller avatar Jul 24 '24 21:07 mateusmuller