kind icon indicating copy to clipboard operation
kind copied to clipboard

Kind cluster fails to provision PV when a USB device was removed from the machine

Open adelton opened this issue 2 years ago • 26 comments

What happened:

I'm running Kind (with export KIND_EXPERIMENTAL_PROVIDER=podman) on my laptop. When I start the cluster while a mouse is connected to the machine, I'm able to create a pod with a local volume. Once I remove that mouse, this starts to fail.

The same issue happens when I close the lid to have the laptop go to sleep, and then wake it up again.

What you expected to happen:

Setup of PVCs and PVs continues to work.

How to reproduce it (as minimally and precisely as possible):

  1. export KIND_EXPERIMENTAL_PROVIDER=podman
  2. lsusb returns something like
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 003: ID 13d3:5405 IMC Networks Integrated Camera
Bus 003 Device 044: ID 06cb:00f9 Synaptics, Inc. 
Bus 003 Device 046: ID 0458:0007 KYE Systems Corp. (Mouse Systems) Trackbar Emotion
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  1. kind create cluster
  2. Have a YAML file duplicating the standard storageclass under the name local-path, something like cat storageclass-local-path.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-path
  namespace: kube-system
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
  1. kubectl apply -f storageclass-local-path.yaml
  2. kubectl apply -k 'https://github.com/rancher/local-path-provisioner/examples/pod-with-local-volume'
  3. After a small while, kubectl get pods -A show volume-test in namespace default as Running.
  4. kubectl delete -k 'https://github.com/rancher/local-path-provisioner/examples/pod-with-local-volume'
  5. Disconnect that USB mouse.
  6. Check with lsusb that the device 003/046 or whatever ids it had is no longer there.
  7. kubectl apply -k 'https://github.com/rancher/local-path-provisioner/examples/pod-with-local-volume'
  8. kubectl get pods -A shows
NAMESPACE            NAME                                                         READY   STATUS       RESTARTS   AGE
default              volume-test                                                  0/1     Pending      0          9s
[...]
local-path-storage   helper-pod-create-pvc-1e7e0729-1ec4-4b0e-91ef-3c41e0495783   0/1     StartError   0          9s
  1. kubectl events -n local-path-storage deployment/local-path-provisioner shows
42s         Warning   Failed              Pod/helper-pod-create-pvc-1e7e0729-1ec4-4b0e-91ef-3c41e0495783   Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error creating device nodes: mount /dev/bus/usb/003/046:/run/containerd/io.containerd.runtime.v2.task/k8s.io/helper-pod/rootfs/dev/bus/usb/003/046 (via /proc/self/fd/6), flags: 0x1000: no such file or directory: unknown

Anything else we need to know?:

I actually first encountered it when I suspended the laptop and then woken it up and wanted to continue using the Kind cluster. The Bus 003 Device 044: ID 06cb:00f9 Synaptics, Inc. device gets a different device id upon wakeup.

Environment:

  • kind version: (use kind version): kind v0.20.0 go1.20.4 linux/amd64
  • Runtime info: (use docker info or podman info):
host:
  arch: amd64
  buildahVersion: 1.32.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.fc38.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 70.31
    systemPercent: 6.54
    userPercent: 23.15
  cpus: 8
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: xfce
    version: "38"
  eventLogger: journald
  freeLocks: 2038
  hostname: machine.example.com
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 2000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 2000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.5.6-200.fc38.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 8981233664
  memTotal: 33331113984
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.8.0-1.fc38.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.8.0
    package: netavark-1.8.0-2.fc38.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.8.0
  ociRuntime:
    name: crun
    package: crun-1.9.2-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.9.2
      commit: 35274d346d2e9ffeacb22cc11590b0266a23d634
      rundir: /run/user/2000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231004.gf851084-1.fc38.x86_64
    version: |
      pasta 0^20231004.gf851084-1.fc38.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/2000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.1-1.fc38.x86_64
    version: |-
      slirp4netns version 1.2.1
      commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 8589877248
  swapTotal: 8589930496
  uptime: 202h 32m 16.00s (Approximately 8.42 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/kind/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/kind/.local/share/containers/storage
  graphRootAllocated: 26241896448
  graphRootUsed: 11933265920
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 94
  runRoot: /tmp/containers-user-2000/containers
  transientStore: false
  volumePath: /home/kind/.local/share/containers/storage/volumes
version:
  APIVersion: 4.7.0
  Built: 1695839078
  BuiltTime: Wed Sep 27 20:24:38 2023
  GitCommit: ""
  GoVersion: go1.20.8
  Os: linux
  OsArch: linux/amd64
  Version: 4.7.0
  • OS (e.g. from /etc/os-release): CPE_NAME="cpe:/o:fedoraproject:fedora:38"
  • Kubernetes version: (use kubectl version):
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.9", GitCommit:"d1483fdf7a0578c83523bc1e2212a606a44fd71d", GitTreeState:"archive", BuildDate:"2023-09-16T00:00:00Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-15T00:36:28Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
  • Any proxies or other special environment settings?: KIND_EXPERIMENTAL_PROVIDER=podman

adelton avatar Oct 23 '23 07:10 adelton

I don't have very clear from the description ... is an error from the local-path-provisioner or is any pod in kind that does not work?

aojea avatar Oct 23 '23 07:10 aojea

The error comes from containerd attempting to start the helper-pod-create-pvc-1e7e0729-1ec4-4b0e-91ef-3c41e0495783 that gets initiated by the local-path-provisioner-6bc4bddd6b-rnsqd to fulfill the PVC request that comes from https://github.com/rancher/local-path-provisioner/blob/master/examples/pvc-with-local-volume/pvc.yaml.

adelton avatar Oct 23 '23 07:10 adelton

is a https://github.com/rancher/local-path-provisioner bug then?

aojea avatar Oct 23 '23 08:10 aojea

I don't think the code in local-path-provisioner does much with setting up the root fs and the mount points for the pod.

This seems to be related to how the "nodes" are created and represented by Kind / init / containerd / something and what they assume and inherit.

adelton avatar Oct 23 '23 08:10 adelton

I don't have very clear from the description ... is an error from the local-path-provisioner or is any pod in kind that does not work?

That is why I asked this, is this with any pod or only with this specific pod?

aojea avatar Oct 23 '23 08:10 aojea

Ah, you meant if there is something wrong about that specific example? Not really, when I turn it into a trivial busybox container with

apiVersion: v1
kind: Pod
metadata:
  name: volume-test-2
spec:
  containers:
  - name: volume-test-2
    image: busybox
    imagePullPolicy: IfNotPresent
    command:
    - mount
    volumeMounts:
    - name: volv2
      mountPath: /data2
  volumes:
  - name: volv2
    persistentVolumeClaim:
      claimName: local-volume-pvc-2
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: local-volume-pvc-2
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Mi

I get the very same error message once the list of USB devices changes.

adelton avatar Oct 23 '23 08:10 adelton

what I'm trying to understand is if it is a general problem or only happens because of the PersistentVolumes

aojea avatar Oct 23 '23 12:10 aojea

I only saw it with that helper pod. When I apply a pod without any volumes

apiVersion: v1
kind: Pod
metadata:
  name: no-volume
spec:
  containers:
  - name: no-volume
    image: busybox
    imagePullPolicy: IfNotPresent
    command:
    - mount

the pod and container get created and run fine. The mount output shows a very limited set of things mounted under /dev/ in that case:

$ kubectl logs pod/no-volume | grep ' on /dev/'
devpts on /dev/pts type devpts (rw,seclabel,nosuid,noexec,relatime,gid=524292,mode=620,ptmxmode=666)
mqueue on /dev/mqueue type mqueue (rw,seclabel,nosuid,nodev,noexec,relatime)
/dev/mapper/vg_machine-lv_containers on /dev/termination-log type ext4 (rw,seclabel,relatime)
shm on /dev/shm type tmpfs (rw,seclabel,nosuid,nodev,noexec,relatime,size=65536k,uid=2000,gid=2000,inode64)
devtmpfs on /dev/null type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/random type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/full type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/tty type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/zero type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/urandom type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)

adelton avatar Oct 23 '23 14:10 adelton

To debug, when I

kubectl edit -n local-path-storage cm local-path-config

and change image to busybox and add a mount and sleep to setup with

    apiVersion: v1
    kind: Pod
    metadata:
      name: helper-pod
    spec:
      containers:
      - name: helper-pod
        image: busybox
        imagePullPolicy: IfNotPresent
  setup: |-
    #!/bin/sh
    set -eu
    mount
    sleep 30
    mkdir -m 0777 -p "$VOL_DIR"

and

kubectl rollout restart deployment local-path-provisioner -n local-path-storage

provisioning the pod with a PVC shows huge number of bind (?) mounts:

 kubectl logs -n local-path-storage helper-pod-create-pvc-59b95912-a254-454b-b26b-889c10b217c6 | grep ' on /dev/'
devpts on /dev/pts type devpts (rw,seclabel,nosuid,noexec,relatime,gid=524292,mode=620,ptmxmode=666)
mqueue on /dev/mqueue type mqueue (rw,seclabel,nosuid,nodev,noexec,relatime)
/dev/mapper/vg_machine-lv_containers on /dev/termination-log type ext4 (rw,seclabel,relatime)
shm on /dev/shm type tmpfs (rw,seclabel,nosuid,nodev,noexec,relatime,size=65536k,uid=2000,gid=2000,inode64)
devtmpfs on /dev/acpi_thermal_rel type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/autofs type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/btrfs-control type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/bus/usb/001/001 type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/bus/usb/002/001 type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/bus/usb/003/001 type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/bus/usb/003/003 type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/bus/usb/003/050 type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/bus/usb/004/001 type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/cpu/0/cpuid type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/cpu/0/msr type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/cpu/1/cpuid type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/cpu/1/msr type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/cpu/2/cpuid type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/cpu/2/msr type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/cpu/3/cpuid type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/cpu/3/msr type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/cpu/4/cpuid type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
[...]
devtmpfs on /dev/watchdog type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/watchdog0 type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/zero type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)
devtmpfs on /dev/zram0 type devtmpfs (rw,seclabel,nosuid,noexec,size=4096k,nr_inodes=4062748,mode=755,inode64)

So something is different between the "normal" pods/containers and the pod/container created as the helper for the local-path provisioner.

adelton avatar Oct 23 '23 14:10 adelton

We don't control the device mounts being propagated from the host to the "node", that's podman.

The helper pod is privileged which is why it is also seeing all the mounts, unlike your simple test pod. https://github.com/rancher/local-path-provisioner/blob/4d42c70e748fed13cd66f86656e909184a5b08d2/provisioner.go#L553

BenTheElder avatar Oct 23 '23 16:10 BenTheElder

Thanks for that pointer -- I confirm that when I add

    securityContext:
      privileged: true

to my regular container, I get the same issues as with the local-path helper.

What I'd like to figure out though: you say "we don't control the device mounts being propagated from the host to the "node"". But in this case it is not propagation of the device mounts from the host because on the host the /dev/bus/usb/*/* device is no longer there. So it is being propagated from something else, possibly some parent (?) pod (?) that has a list of devices that it once saw?

adelton avatar Oct 23 '23 18:10 adelton

IIRC docker/podman will sync all the /dev entries on creating the container, but there is not mount propagation to reflect updated entries. Then the nested containerd/runc will try to create these for the "inner" pod containers.

I don't think there are great solutions here ... maybe we can find a way to detect these "dangling" mounts and remove them from the node or hook the inner runc.

FWIW kind clusters are meant to be disposable and quick to create so maybe recreate after changing devices :/

BenTheElder avatar Oct 23 '23 22:10 BenTheElder

The opposite is a known issue with docker: "privileged containers do not reflect newly added host devices" has been a longstanding issue as I recall. We should look at what workarounds people are using for this since it's more or less the same root issue: https://github.com/moby/moby/issues/16160

BenTheElder avatar Oct 23 '23 22:10 BenTheElder

Well realistically I'd be OK to just disable any propagation of /dev/bus/usb to the containers, either the first one (podman), or the next layer (containerd?). Is the search for the devices somehow configurable in either of those cases?

adelton avatar Oct 24 '23 07:10 adelton

Well realistically I'd be OK to just disable any propagation of /dev/bus/usb to the containers, either the first one (podman), or the next layer (containerd?). Is the search for the devices somehow configurable in either of those cases?

No, we're not even telling podman/docker to pass through these to the node, it's implicit with --privileged which we need to run Kubernetes/containerd.

Ditto with the privileged pods. Everything under dev gets passed through IIRC*

* a TTY for the container may be setup specially.

BenTheElder avatar Oct 24 '23 17:10 BenTheElder

So with some experimentation, I got the setup working with

--- a/images/base/files/etc/containerd/config.toml
+++ b/images/base/files/etc/containerd/config.toml
@@ -19,6 +19,9 @@ version = 2
   runtime_type = "io.containerd.runc.v2"
   # Generated by "ctr oci spec" and modified at base container to mount poduct_uuid
   base_runtime_spec = "/etc/containerd/cri-base.json"
+
+  privileged_without_host_devices = true
+
   [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
     # use systemd cgroup by default
     SystemdCgroup = true

and rebuilding the base and node images.

I tested it with a rootless podman and both pods with PVs and running a privileged pod works, both with the USB unplug use-case and suspending the laptop and waking it up. I did not try any additional tests to see what this might break. If I file this as a pull request, will you allow the tests to run to see what it discovers in the general Kind testing / CI?

Now the questions is if / how to make this available in Kind in general, what the default should be, and what mechanism to provide for people to override it.

Given not having those devices in the privileged containers seems like a safer default, and with https://github.com/moby/moby/issues/16160 unaddressed hotplugging of devices does not work with docker anyway, I'd lean towards having true (no host devices) as the default.

But what should people use to override it?

Mounting the config.toml via extraMounts does not work because it gets manipulated at least in https://github.com/kubernetes-sigs/kind/blob/main/images/base/files/usr/local/bin/entrypoint.

We could add another KIND_EXPERIMENTAL_CONTAINERD_ variable and amend that sed -i logic to use it.

We could also use

imports = ["/etc/containerd/config.d/*.toml"]

and document extraMounts-ing any overrides into that directory. In fact, the configure_containerd in https://github.com/kubernetes-sigs/kind/blob/main/images/base/files/usr/local/bin/entrypoint could use that mechanism instead of that sed -i approach as well.

I don't want to make a change like moving from that sed -i to drop-in snippets just for this device-mounting issue ... but I'd be happy to provide a PR do switch to the drop-in snippets approach if it is viewed as a useful approach in general.

adelton avatar Oct 26 '23 12:10 adelton

Now the questions is if / how to make this available in Kind in general, what the default should be, and what mechanism to provide for people to override it.

I suspect this would break a LOT of users doing interesting driver development.

Given not having those devices in the privileged containers seems like a safer default, and with https://github.com/moby/moby/issues/16160 unaddressed hotplugging of devices does not work with docker anyway, I'd lean towards having true (no host devices) as the default.

I'm fairly certain this would break standard kubernetes tests.

You can configure this for your clusters today though with the poorly documented containerdConfigPatch https://kind.sigs.k8s.io/docs/user/private-registries/#use-a-certificate

BenTheElder avatar Oct 26 '23 15:10 BenTheElder

Ah, great.

I confirm that with

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane

[...]

containerdConfigPatches:
  - |-
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
      privileged_without_host_devices = true

things work just fine.

I'm closing this issue as I have a way to address the problem I've been hitting. If you think that exposing this in some way (possibly in documentation?) might be helpful to others, let me know.

adelton avatar Oct 26 '23 16:10 adelton

I'd like to reopen this if you don't mind because I know other users are going to hit this and requiring the workaround config is still unfortunate.

We should probably add a "known issues" page entry to start with a pointer to this configuration and continue to track this while we consider options to automatically mitigate.

I think it will be pretty involved to implement but ideally we'd just trim missing entries.

BenTheElder avatar Oct 26 '23 18:10 BenTheElder

Actually, in the docker issue there's a suggestion to just bind mount /dev explicitly to avoid this behavior? 👀

https://github.com/moby/moby/issues/16160#issuecomment-551388571

BenTheElder avatar Oct 26 '23 18:10 BenTheElder

We can test this with extraMounts hostPath: /dev containerPath: /dev

BenTheElder avatar Oct 26 '23 18:10 BenTheElder

I confirm that with

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraMounts:
  - hostPath: /dev
    containerPath: /dev

the problem is gone as well.

After the removal of the USB mouse, the device node gets removed from host's /dev/bus/usb/003/ and it is no longer shown in

podman exec kind-control-plane mount | grep ' on /dev'

and creating a pod with a privileged container passes as well.

With this approach, I would just be concerned about implications on /dev/tty and similar non-global, per process devices.

adelton avatar Oct 29 '23 16:10 adelton

With this approach, I would just be concerned about implications on /dev/tty and similar non-global, per process devices.

/dev/tty at least I'm pretty sure gets specially setup in run regardless, but I share that concern, I'd want to carefully investigate before doing this by default, but it seems like this might be sufficient

BenTheElder avatar Oct 31 '23 20:10 BenTheElder