wireguard-operator Does not work with baseline pod security standard

Describe the bug

❯ k describe rs
Events:
  Type     Reason        Age                 From                   Message
  ----     ------        ----                ----                   -------
  Warning  FailedCreate  109s                replicaset-controller  Error creating: pods "media-dep-878876c8d-vxz94" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (containers "metrics", "agent" must not include "NET_ADMIN" in securityContext.capabilities.add)
  Warning  FailedCreate  109s                replicaset-controller  Error creating: pods "media-dep-878876c8d-xz8fh" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (containers "metrics", "agent" must not include "NET_ADMIN" in securityContext.capabilities.add)
  Warning  FailedCreate  109s                replicaset-controller  Error creating: pods "media-dep-878876c8d-85956" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (containers "metrics", "agent" must not include "NET_ADMIN" in securityContext.capabilities.add)
  Warning  FailedCreate  109s                replicaset-controller  Error creating: pods "media-dep-878876c8d-bh8p7" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (containers "metrics", "agent" must not include "NET_ADMIN" in securityContext.capabilities.add)
  Warning  FailedCreate  109s                replicaset-controller  Error creating: pods "media-dep-878876c8d-ln28h" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (containers "metrics", "agent" must not include "NET_ADMIN" in securityContext.capabilities.add)
  Warning  FailedCreate  109s                replicaset-controller  Error creating: pods "media-dep-878876c8d-wjsrs" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (containers "metrics", "agent" must not include "NET_ADMIN" in securityContext.capabilities.add)
  Warning  FailedCreate  109s                replicaset-controller  Error creating: pods "media-dep-878876c8d-psmgq" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (containers "metrics", "agent" must not include "NET_ADMIN" in securityContext.capabilities.add)
  Warning  FailedCreate  109s                replicaset-controller  Error creating: pods "media-dep-878876c8d-ctlb4" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (containers "metrics", "agent" must not include "NET_ADMIN" in securityContext.capabilities.add)
  Warning  FailedCreate  108s                replicaset-controller  Error creating: pods "media-dep-878876c8d-qwstr" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (containers "metrics", "agent" must not include "NET_ADMIN" in securityContext.capabilities.add)
  Warning  FailedCreate  27s (x6 over 107s)  replicaset-controller  (combined from similar events): Error creating: pods "media-dep-878876c8d-fvh5h" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (containers "metrics", "agent" must not include "NET_ADMIN" in securityContext.capabilities.add)

To Reproduce

Run a Kubernetes cluster with the baseline pod security standard (e.g Talos).

https://kubernetes.io/docs/concepts/security/pod-security-admission/

Expected behavior

Optionally use the userspace wireguard implementation.

Screenshots

N/A

Additional context

May 10 '24 21:05 uhthomas

Maybe the operator could remove the privileged security context if the user space implementation is being used?

May 12 '24 23:05 uhthomas

Did you got any success running it atop of Talos?

I've added pod-security.kubernetes.io/enforce: privileged label to namespace - do you think it's safe and enough?

Jun 24 '24 00:06 matrix-root

Did you got any success running it atop of Talos?

I've added pod-security.kubernetes.io/enforce: privileged label to namespace - do you think it's safe and enough?

I use Talos, and it works but it does need that label. A lot of projects need it unfortunately.

Jun 24 '24 17:06 uhthomas

I ended up using this magic incantation to fix wireguard on Talos:

apiVersion: v1
kind: Namespace
metadata:
  name: wireguard
  labels:
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/audit-version: latest
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/enforce-version: latest
    pod-security.kubernetes.io/warn: privileged
    pod-security.kubernetes.io/warn-version: latest

Jun 27 '24 09:06 Twi

@Twi The only label which should be necessary is pod-security.kubernetes.io/enforce: privileged. The logs may complain without some of those other labels, but it will work.

Jun 27 '24 09:06 uhthomas

Can this change be added to the project? I've never used tailos so I cannot test it :(. I'd really appreciate if you can add it!

Jun 27 '24 13:06 jodevsa

Optionally use the userspace wireguard implementation.

I'm wondering if there is a way to detect that we are running on tailsos and we cannot run the kernal mode wireguard?

Jun 27 '24 13:06 jodevsa

It would be a nice feature to have, though it is important to note this is not specific to Talos but any Kubernetes cluster which enforces the baseline pod security standard. There is already some fallback mechanism in place when creating the tunnel itself, but I believe the operator will need to also make changes to the pods too.

Jun 27 '24 13:06 uhthomas

If we won’t get success with user space implementation - at least we can add notice about PodSecurity into README :)

Else it could take time for other guys to discover reason of issue

Jun 27 '24 13:06 matrix-root

I wonder what the right way to do this is? I guess the first step is to add some configuration option to force user space (and remove NET_ADMIN from the security capabilities). A feature could then be built on top of that which automatically detects the current pod security standard? Not sure what the right default is. User space is likely to be less efficient, but more compatible.

Jun 27 '24 14:06 uhthomas

I like the multiple phases approach ^^

I wonder what the right way to do this is? I guess the first step is to add some configuration option to force user space (and remove NET_ADMIN from the security capabilities).

So there is currently a parameter in the wiregurad resource called useWgUserspaceImplementation

              useWgUserspaceImplementation:
                description: A boolean field that specifies whether to use the userspace

https://github.com/jodevsa/wireguard-operator/blob/main/config/crd/bases/vpn.wireguard-operator.io_wireguards.yaml#L72

this paremeter gets populated in the agent, which is the bootstraping software that actually runs wireguard. What is currently missing is that we need to stop populating the security capabilities if useWgUserspaceImplementation is true.

so around here: https://github.com/jodevsa/wireguard-operator/blob/main/pkg/controllers/wireguard_controller.go#L741

we need soemthing like

if m.spec.useWgUserspaceImplementation != true {
// inject the security capabilitiy
}

Jun 30 '24 17:06 jodevsa

which automatically detects the current pod security standard

Any ideas on how we can detect that? is their a kubernetes configmap that can be read to know the allowed capabilities? I think that might be more straightforward than trying to run a pod with that capabilitiy and waiting to see if that fails

Jun 30 '24 17:06 jodevsa

so, going back to what @uhthomas suggested, we have 2 phases to get this complete:

Phase 1: Do not use NET_ADMIN capability if wireguard.spec. useWgUserspaceImplementation is equal to true Phase 2: Detect the pod security standard and fallback to userspace implementation if we are not allowed to have NET_ADMIN capability

Jun 30 '24 17:06 jodevsa

example of using the flag:


apiVersion: vpn.wireguard-operator.io/v1alpha1
kind: Wireguard
metadata:
  name: vpn
spec:
  useWgUserspaceImplementation: true

Jun 30 '24 18:06 jodevsa