container-device-interface icon indicating copy to clipboard operation
container-device-interface copied to clipboard

[Spec extension] Allow for IPC, NET, and CAP modifications to be specified in the CDI spec

Open elezar opened this issue 3 years ago • 14 comments

We have a use case that would require the following (docker) arguments to work out the box:

--ipc host
--net host
--cap-add=IPC_LOCK

Does it make sense to extend the CDI spec to allow for IPC, NET, and CAP modifications to the OCI spec?

elezar avatar Mar 16 '22 11:03 elezar

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions[bot] avatar May 29 '24 04:05 github-actions[bot]

This issue was automatically closed due to inactivity.

github-actions[bot] avatar Jun 29 '24 04:06 github-actions[bot]

Does it make sense to extend the CDI spec to allow for IPC, NET, and CAP modifications to the OCI spec?

we would have a use-case with --cap-add=IPC_LOCK too so +1

mythi avatar Oct 28 '24 08:10 mythi

@mythi let's reopen this then. Would you have time to expand on this / create a PR with the additions?

Note that one concern is that these changes would happen without the user knowing about them, so it may be good to limit the scope to use cases that don't provide too many additional privileges.

elezar avatar Oct 28 '24 13:10 elezar

@elezar @mythi Just for understanding, CDI would be able to override what the Pod spec is configuring? What would happen in this case when run with docker,podman --net=host and CDI says something else?

zvonkok avatar Oct 28 '24 13:10 zvonkok

I did not think of the details yet. My use case is with QAT devices that use VFIO framework. Pods requesting resources (DRA/device plugins) would also need to know to add IPC_LOCK to get the workload to work. The idea is that users would not need to know those details but the HW dependent settings would come through CDI.

mythi avatar Oct 28 '24 13:10 mythi

I will throw in rlimit settings, since k8s is not yet able to set them properly, where one would need CAP_SYS_RESOURCE as well so it kinda fits in here as well. https://github.com/kubernetes/kubernetes/issues/3595

zvonkok avatar Oct 28 '24 13:10 zvonkok

@elezar @mythi Just for understanding, CDI would be able to override what the Pod spec is configuring? What would happen in this case when run with docker,podman --net=host and CDI says something else?

That is what we would need to discuss / define for this issue. We would need to define the modifications that are to be made to the OCI runtime spec as well as determine whether there could be contradicting configurations.

elezar avatar Oct 28 '24 14:10 elezar

Maybe splitting this into adding CAP_ADD support and the "host"-specifics would be useful. One option woudl be to add a REQUIRES_CAP check that would alert the user if the required caps are not specified?

elezar avatar Oct 28 '24 14:10 elezar

Hmm... changing the IPC and network namespace seems a bit extreme. A failure to inject with an appropriate error message in a case where a device requires host IPC or network namespace but runs in a different one sounds more appropriate... as a last resort. Now I have not followed the latest directions DRA has taken, but when a CDI device reference is added by a DRA driver, it would sound more appropriate to already fail as early as possible, in the DRA driver.

klihub avatar Oct 29 '24 07:10 klihub

I will throw in rlimit settings, since k8s is not yet able to set them properly, where one would need CAP_SYS_RESOURCE as well so it kinda fits in here as well. kubernetes/kubernetes#3595

@zvonkok for rlimit, CDI probably not the best way to do it. you can use NRI plugin to inject it based on the need. See https://github.com/containerd/nri/pull/48 for details.

kad avatar Oct 29 '24 08:10 kad

I have a different use case where a local resource I am exposing requires NET_RAW and NET_ADMIN capabilities to function correctly.

I think from a security point of view it might be better to just produce an error if the container doesn't have the capabilities required, as just adding them behind the scenes via cdi might be surprising to the user and expose them to security risks without realizing the container is running with added capabilies.

bachp avatar Dec 05 '24 21:12 bachp

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.

github-actions[bot] avatar Mar 06 '25 04:03 github-actions[bot]

This issue was automatically closed due to inactivity.

github-actions[bot] avatar Apr 06 '25 04:04 github-actions[bot]