lxd icon indicating copy to clipboard operation
lxd copied to clipboard

Remove `nvidia-container-cli`

Open gabrielmougard opened this issue 1 year ago • 1 comments

With the introduction of https://github.com/canonical/lxd/pull/13562 , we can pass an NVIDIA GPU through a LXD container using a CDI notation. This approach unify the dGPU and the iGPU passthrough. Now, nvidia-container-cli is still shipped with LXD for traditional dGPU passthrough (using either a DRM card id or a GPU PCIe address), but is being deprecated by NVIDIA and no further development effort will be added to it. nvidia-container-cli needs to be removed. Here are some considerations:

  • We need to introduce a replacement tool to list the GPU resources of a host: currently, this is done with nvidia-container-cli info --csv and the results are exposed at GET 1.0/resources under the .gpu.cards field. Could we introduce a tool like deviceQuery (see here) that is listing resources as well AND which support dGPU and iGPU resource listing?
  • If we remove nvidia-container-cli, we no longer need to pass a PCIe address parameter when adding a GPU device since the detection logic is handled by an NVIDIA lib and not LXD: what are the implications in term of API breaking changes for the users? Shall we keep this device parameter and 'resolve' to a CDI identifier? Shall we remove this parameter completely?

gabrielmougard avatar Aug 29 '24 09:08 gabrielmougard

@mionaalex this would be a good potential roadmap item

tomponline avatar Aug 29 '24 09:08 tomponline

Once implemented, this should allow (or make it easier) to drop the nvidia-container part of the LXD snap.

simondeziel avatar Apr 17 '25 14:04 simondeziel