vcluster
vcluster copied to clipboard
anti affinity missing on synced pod if namespace selector matches all namespaces
What happened?
We are installing multiple charts in different namespaces, and for some pods, they need pod anti affinity across all the namespaces, to make sure no two pods from the same role are scheduled to the same node.
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: sfcn.cisco.com/enforcement-point
operator: Exists
topologyKey: kubernetes.io/hostname
namespaceSelector: {}
The "sfcn.cisco.com/enforcement-point" label is used to indicate the pod is an enforcer role, and no two enforcer pods should be scheduled to the same node, regardless of namespace. However, when this pod is synced to the host cluster, the "sfcn.cisco.com/enforcement-point" label is missing. We are seeing the following anti affinity rule, which matches all the pods installed in the corresponding vcluster.
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
vcluster.loft.sh/managed-by: <vcluster-name>
topologyKey: kubernetes.io/hostname
What did you expect to happen?
I expect to see the matchExpressions preserved in the synced pod
matchExpressions:
- key: sfcn.cisco.com/enforcement-point
operator: Exists
How can we reproduce it (as minimally and precisely as possible)?
Create a pod using the following spec in a vcluster
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: aws-cli
image: amazon/aws-cli:latest
command:
- /bin/bash
- '-c'
- sleep infinity
imagePullPolicy: Always
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: sfcn.cisco.com/enforcement-point
operator: Exists
topologyKey: kubernetes.io/hostname
namespaceSelector: {}
Go to the host cluster, read the pod spec, you'll have the following affinity config
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
vcluster.loft.sh/managed-by: <vcluster-name>
topologyKey: kubernetes.io/hostname
Anything else we need to know?
No response
Host cluster Kubernetes version
$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.16", GitCommit:"c5f43560a4f98f2af3743a59299fb79f07924373", GitTreeState:"clean", BuildDate:"2023-11-15T22:39:12Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25+", GitVersion:"v1.25.16-eks-8cb36c9", GitCommit:"3a3ea80e673d7867f47bdfbccd4ece7cb5f4a83a", GitTreeState:"clean", BuildDate:"2023-11-22T21:53:22Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}
Host cluster Kubernetes distribution
EKS
vlcuster version
$ vcluster --version
vcluster version 0.18.1
Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)
vcluster-eks
OS and Arch
OS: N/A
Arch: N/A
@ytizhang thanks for creating this issue, mhh I think that should work, can you send the rewritten pod yaml from the host cluster?
The relevant implementation is here: https://github.com/loft-sh/vcluster/blob/be19d090ebc5d6bc7275cca68d062bbd608ca097/pkg/controllers/resources/pods/translate/translator.go#L772
Here is the rewritten pod yaml
apiVersion: v1
kind: Pod
metadata:
name: test-pod-x-default-x-yitzhang-veks
namespace: yitzhang
uid: 5fa326da-f0b0-4d8e-82bd-9182698e085c
resourceVersion: '948445374'
creationTimestamp: '2024-02-02T22:31:02Z'
labels:
vcluster.loft.sh/managed-by: yitzhang-veks
vcluster.loft.sh/namespace: default
vcluster.loft.sh/ns-label-yitzhang-veks-x-cf1227b7b2: default
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: 'false'
cni.projectcalico.org/containerID: 36404db2362f9160e61f414984be7592d9830baba782fe545bceec457fc27398
cni.projectcalico.org/podIP: 192.168.59.146/32
cni.projectcalico.org/podIPs: 192.168.59.146/32
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"test-pod","namespace":"default"},"spec":{"affinity":{"podAntiAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":[{"labelSelector":{"matchExpressions":[{"key":"sfcn.cisco.com/enforcement-point","operator":"Exists"}]},"namespaceSelector":{},"topologyKey":"kubernetes.io/hostname"}]}},"containers":[{"command":["/bin/bash","-c","sleep
infinity"],"image":"amazon/aws-cli:latest","imagePullPolicy":"Always","name":"aws-cli"}]}}
vcluster.loft.sh/labels: ''
vcluster.loft.sh/managed-annotations: kubectl.kubernetes.io/last-applied-configuration
vcluster.loft.sh/name: test-pod
vcluster.loft.sh/namespace: default
vcluster.loft.sh/object-name: test-pod
vcluster.loft.sh/object-namespace: default
vcluster.loft.sh/object-uid: bd2375e4-c239-4bdc-a4c8-740a3a301378
vcluster.loft.sh/service-account-name: default
vcluster.loft.sh/uid: bd2375e4-c239-4bdc-a4c8-740a3a301378
spec:
volumes:
- name: kube-api-access-w4lw5
projected:
sources:
- downwardAPI:
items:
- path: token
fieldRef:
apiVersion: v1
fieldPath: metadata.annotations['vcluster.loft.sh/token-kfpixaor']
mode: 420
- configMap:
name: kube-root-ca.crt-x-default-x-yitzhang-veks
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.annotations['vcluster.loft.sh/namespace']
defaultMode: 420
containers:
- name: aws-cli
image: amazon/aws-cli:latest
command:
- /bin/bash
- '-c'
- sleep infinity
env:
- name: KUBERNETES_PORT
value: tcp://172.20.144.56:443
- name: KUBERNETES_PORT_443_TCP
value: tcp://172.20.144.56:443
- name: KUBERNETES_PORT_443_TCP_ADDR
value: 172.20.144.56
- name: KUBERNETES_PORT_443_TCP_PORT
value: '443'
- name: KUBERNETES_PORT_443_TCP_PROTO
value: tcp
- name: KUBERNETES_SERVICE_HOST
value: 172.20.144.56
- name: KUBERNETES_SERVICE_PORT
value: '443'
- name: KUBERNETES_SERVICE_PORT_HTTPS
value: '443'
resources: {}
volumeMounts:
- name: kube-api-access-w4lw5
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: None
serviceAccountName: default-x-default-x-yitzhang-veks
serviceAccount: default-x-default-x-yitzhang-veks
automountServiceAccountToken: false
nodeName: ip-10-216-64-90.us-west-2.compute.internal
securityContext: {}
hostname: test-pod
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
vcluster.loft.sh/managed-by: yitzhang-veks
topologyKey: kubernetes.io/hostname
schedulerName: default-scheduler
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
hostAliases:
- ip: 172.20.144.56
hostnames:
- kubernetes
- kubernetes.default
- kubernetes.default.svc
priority: 0
dnsConfig:
nameservers:
- 172.20.241.77
searches:
- default.svc.cluster.local
- svc.cluster.local
- cluster.local
options:
- name: ndots
value: '5'
enableServiceLinks: false
preemptionPolicy: PreemptLowerPriority
@FabianKramm Upon reading the code at https://github.com/loft-sh/vcluster/blob/be19d090ebc5d6bc7275cca68d062bbd608ca097/pkg/controllers/resources/pods/translate/translator.go#L772, when there is NamespaceSelector in vPod, the translated label selector will override the ones set in https://github.com/loft-sh/vcluster/blob/be19d090ebc5d6bc7275cca68d062bbd608ca097/pkg/controllers/resources/pods/translate/translator.go#L735