CDI-injected bind-mount inaccessible for pods created with `hostUsers: false`
https://github.com/kubernetes/kubernetes/issues/134604 reported against kubernetes proper is actually a problem with CDI. See the original issue for details and discussion.
As a short summary, when a pod is running with hostUsers: false, then all mounts should (and kubelet-requested runtime-injected ones will) have UID- and GID-mappings. Since CDI inject mounts without any UID- and GID-mappings containers trying to access CDI-injected bind-mounted directories will fail with EPERM unless the mounted directory has at least 0666 permissions set.
@klihub the fix in the PR looks reasonable. Does this also mean that we need to extend the CDI spec to allow these fields to be specified in mounts assuming that the mappings are known at the point of spec generation?
@klihub the fix in the PR looks reasonable. Does this also mean that we need to extend the CDI spec to allow these fields to be specified in mounts assuming that the mappings are known at the point of spec generation?
@elezar Well, that's one of the things I wanted to ask your (and others') opinion about. And I added some related comments/questions to the PR. Eventually I think we probably should add it, so that the one requesting injection could also set up the mapping.
For a first step I was thinking that we could roll a quick fix without any public API changes, effectively what we have in the current PR, tag it as a patch release, then update both CRI-O and containerd to use it.
After that we could think about extending this more by allowing the one requesting mount/device injection to also set up the UID/GID mapping. But this is where I had the biggest gotcha/question I also mentioned in the PR.
How to do it, or more specifically is it enough/good (enough) if we go down the straightforward path by simply adding the mappings to CDI's notion of mounts. AFAICT, the ID mappings tend to be awfully container specific while the rest not so much. This is not a problem if you always generate dynamically a dedicated Spec for every CDI injection request you perform. However, if your model is discovering and generating a Spec with all devices during bootup, and maybe updating it in response to (some, for instance, udev hotplug) events, then this does not really work well for you.
What would work instead is the ability to pass on additional parameters with a CDI injection request which, in addition to the 'static' device-specific bits, would specify more 'dynamic' container-specific bits for the injection, in this case the container specific ID mappings. IOW, it starts to feel a bit like we could use the ability to further parametrize some aspects of an injected device somehow. But we have nothing of the kind currently in CDI, so it would require some head-scratching 1) to decide if that is really a good idea to have, and 2) provided we consider it necessary and useful, how we should go about it...
So this is why I went with the minimal implementation PR approach.
/cc @kad @bart0sh @haircommander