linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

linkerd-cni pod unable to start if linkerd-viz installed in the same namespace

Open ValeriiVozniuk opened this issue 5 months ago • 2 comments

What is the issue?

When deploying all linkerd components in the same namespace with CNI enabled, cni pod cannot be recreated if linkerd-viz is present.

How can it be reproduced?

  1. Deploy 3 node k3s cluster
  2. Deploy linkerd via helm to it
helm repo add linkerd https://helm.linkerd.io/stable
helm upgrade --install linkerd-crds linkerd/linkerd-crds -n linkerd --create-namespace

helm upgrade --install linkerd-cni -n linkerd linkerd/linkerd2-cni --set destCNINetDir=/var/lib/rancher/k3s/agent/etc/cni/net.d --set destCNIBinDir=/var/lib/rancher/k3s/data/current/bin

# To prevent race condition with cni start
sleep 10

helm upgrade --install linkerd-control-plane -n linkerd linkerd/linkerd-control-plane --set-file identityTrustAnchorsPEM=ca.crt --set-file identity.issuer.tls.crtPEM=issuer.crt --set-file identity.issuer.tls.keyPEM=issuer-private.pem --set cniEnabled=true

helm upgrade --install linkerd-viz -n linkerd linkerd/linkerd-viz
  1. See all pods up and running
linkerd-cni-47k87                         1/1     Running   0          4m46s
linkerd-cni-rt9kw                         1/1     Running   0          34s
linkerd-cni-x8jw9                         1/1     Running   0          4m47s
linkerd-destination-647b59dbf9-5mngt      4/4     Running   0          3m34s
linkerd-identity-74db5c88bc-bhm58         2/2     Running   0          3m34s
linkerd-proxy-injector-7cf7775846-wphnq   2/2     Running   0          3m34s
metrics-api-655c9446db-wmmjn              2/2     Running   0          86s
prometheus-7b5b5d8548-7j67g               2/2     Running   0          86s
tap-76bc7d4b8-t44n2                       2/2     Running   0          86s
tap-injector-6d548c4796-zkc7n             2/2     Running   0          86s
web-b4df69565-fsz5n                       2/2     Running   0          86s
  1. Delete one cni pod.
  2. New cni pod is unable to start

Logs, error output, etc

kubectl describe daemonset
Name:           linkerd-cni
Selector:       k8s-app=linkerd-cni
Node-Selector:  kubernetes.io/os=linux
Labels:         app.kubernetes.io/managed-by=Helm
                k8s-app=linkerd-cni
                linkerd.io/cni-resource=true
Annotations:    deprecated.daemonset.template.generation: 1
                linkerd.io/created-by: linkerd/helm v1.3.0
                meta.helm.sh/release-name: linkerd-cni
                meta.helm.sh/release-namespace: linkerd
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 2
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 2
Number of Nodes Misscheduled: 0
Pods Status:  2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           k8s-app=linkerd-cni
                    linkerd.io/cni-resource=true
  Annotations:      linkerd.io/created-by: linkerd/helm v1.3.0
                    linkerd.io/inject: disabled
  Service Account:  linkerd-cni
  Containers:
   install-cni:
    Image:      cr.l5d.io/linkerd/cni-plugin:v1.3.0
    Port:       <none>
    Host Port:  <none>
    Environment:
      DEST_CNI_NET_DIR:    <set to the key 'dest_cni_net_dir' of config map 'linkerd-cni-config'>    Optional: false
      DEST_CNI_BIN_DIR:    <set to the key 'dest_cni_bin_dir' of config map 'linkerd-cni-config'>    Optional: false
      CNI_NETWORK_CONFIG:  <set to the key 'cni_network_config' of config map 'linkerd-cni-config'>  Optional: false
      SLEEP:               true
    Mounts:
      /host/var/lib/rancher/k3s/agent/etc/cni/net.d from cni-net-dir (rw)
      /host/var/lib/rancher/k3s/data/current/bin from cni-bin-dir (rw)
      /tmp from linkerd-tmp-dir (rw)
  Volumes:
   cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rancher/k3s/data/current/bin
    HostPathType:
   cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rancher/k3s/agent/etc/cni/net.d
    HostPathType:
   linkerd-tmp-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
Events:
  Type     Reason            Age               From                  Message
  ----     ------            ----              ----                  -------
  Normal   SuccessfulCreate  6m44s             daemonset-controller  Created pod: linkerd-cni-xqns8
  Normal   SuccessfulCreate  6m43s             daemonset-controller  Created pod: linkerd-cni-x8jw9
  Normal   SuccessfulCreate  6m43s             daemonset-controller  Created pod: linkerd-cni-47k87
  Normal   SuccessfulCreate  2m31s             daemonset-controller  Created pod: linkerd-cni-rt9kw
  Warning  FailedCreate      13s               daemonset-controller  Error creating: pods "linkerd-cni-kf8n8" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "install-cni" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "install-cni" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "cni-bin-dir", "cni-net-dir" use restricted volume type "hostPath"), runAsNonRoot != true (pod or container "install-cni" must set securityContext.runAsNonRoot=true)
  Warning  FailedCreate      13s               daemonset-controller  Error creating: pods "linkerd-cni-tjw6n" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "install-cni" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "install-cni" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "cni-bin-dir", "cni-net-dir" use restricted volume type "hostPath"), runAsNonRoot != true (pod or container "install-cni" must set securityContext.runAsNonRoot=true)
  Warning  FailedCreate      13s               daemonset-controller  Error creating: pods "linkerd-cni-z2xpn" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "install-cni" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "install-cni" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "cni-bin-dir", "cni-net-dir" use restricted volume type "hostPath"), runAsNonRoot != true (pod or container "install-cni" must set securityContext.runAsNonRoot=true)
  Warning  FailedCreate      13s               daemonset-controller  Error creating: pods "linkerd-cni-pw7tz" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "install-cni" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "install-cni" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "cni-bin-dir", "cni-net-dir" use restricted volume type "hostPath"), runAsNonRoot != true (pod or container "install-cni" must set securityContext.runAsNonRoot=true)
  Warning  FailedCreate      13s               daemonset-controller  Error creating: pods "linkerd-cni-wvdhs" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "install-cni" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "install-cni" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "cni-bin-dir", "cni-net-dir" use restricted volume type "hostPath"), runAsNonRoot != true (pod or container "install-cni" must set securityContext.runAsNonRoot=true)
  Warning  FailedCreate      13s               daemonset-controller  Error creating: pods "linkerd-cni-nds6z" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "install-cni" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "install-cni" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "cni-bin-dir", "cni-net-dir" use restricted volume type "hostPath"), runAsNonRoot != true (pod or container "install-cni" must set securityContext.runAsNonRoot=true)
  Warning  FailedCreate      13s               daemonset-controller  Error creating: pods "linkerd-cni-b7xhx" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "install-cni" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "install-cni" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "cni-bin-dir", "cni-net-dir" use restricted volume type "hostPath"), runAsNonRoot != true (pod or container "install-cni" must set securityContext.runAsNonRoot=true)
  Warning  FailedCreate      13s               daemonset-controller  Error creating: pods "linkerd-cni-4jwnw" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "install-cni" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "install-cni" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "cni-bin-dir", "cni-net-dir" use restricted volume type "hostPath"), runAsNonRoot != true (pod or container "install-cni" must set securityContext.runAsNonRoot=true)
  Warning  FailedCreate      13s               daemonset-controller  Error creating: pods "linkerd-cni-xkwsj" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "install-cni" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "install-cni" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "cni-bin-dir", "cni-net-dir" use restricted volume type "hostPath"), runAsNonRoot != true (pod or container "install-cni" must set securityContext.runAsNonRoot=true)
  Warning  FailedCreate      2s (x4 over 13s)  daemonset-controller  (combined from similar events): Error creating: pods "linkerd-cni-prbrt" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "install-cni" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "install-cni" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "cni-bin-dir", "cni-net-dir" use restricted volume type "hostPath"), runAsNonRoot != true (pod or container "install-cni" must set securityContext.runAsNonRoot=true)

output of linkerd check -o short

I've run this from local console, and output is garbled, not sure if it of any help

linkerd-viz
-----------ing viz extension check
‼ linkerd-viz pods are injectedeck
    could not find proxy container for linkerd-cni-47k87 pod
    see https://linkerd.io/2.14/checks/#l5d-viz-pods-injection for hints
     - Running viz extension check
Status check results are ×on check

Environment

Kubernetes: v1.27.10+k3s2 Platform: k3s Host OS: Ubuntu 22.04 LTS Linkerd: stable-2.14.10

Possible solution

No response

Additional context

When deploying linkerd-viz to a separate namespace, the issue doesn't appear. But it is not clear why it is happening when deploying all components in the same namespace.

Would you like to work on fixing this bug?

None

ValeriiVozniuk avatar Feb 23 '24 16:02 ValeriiVozniuk

@ValeriiVozniuk thanks for raising this. We generally do not recommend running extensions in the same namespace as the control plane.

The control plane itself will come pre-configured with the sidecar container; we also provide some overrides for some specific static configuration values. To avoid race conditions (and undefined behaviour during start-up) the control plane should not be injected (it can't be injected if the server managing injections is part of the control plane itself).

The linkerd namespace has the mutating webhook disabled, see these configuration snippets for more details:

  • https://github.com/linkerd/linkerd2/blob/main/charts/linkerd-control-plane/templates/namespace.yaml
  • https://github.com/linkerd/linkerd2/blob/main/charts/linkerd-control-plane/values.yaml#L408-L413

I think the behaviour you're seeing is expected, pods cannot be injected in the namespace, CNI fails because the pods don't have the proxy container (since they weren't injected). I'd recommend installing viz in a separate namespace to fix this.

mateiidavid avatar Feb 25 '24 14:02 mateiidavid

Thank you, I'll discuss this with our architect, as we prefer to have the same stuff present in the same namespace to simplify future NetworkPolicies rollout :). I think then it should be highlighted in docs, that using single namespace is not recommended and could cause issues :)

ValeriiVozniuk avatar Feb 25 '24 21:02 ValeriiVozniuk