Remove `nvidia-container-cli`
With the introduction of https://github.com/canonical/lxd/pull/13562 , we can pass an NVIDIA GPU through a LXD container using a CDI notation. This approach unify the dGPU and the iGPU passthrough. Now, nvidia-container-cli is still shipped with LXD for traditional dGPU passthrough (using either a DRM card id or a GPU PCIe address), but is being deprecated by NVIDIA and no further development effort will be added to it. nvidia-container-cli needs to be removed. Here are some considerations:
- We need to introduce a replacement tool to list the
GPU resourcesof a host: currently, this is done withnvidia-container-cli info --csvand the results are exposed atGET 1.0/resourcesunder the.gpu.cardsfield. Could we introduce a tool likedeviceQuery(see here) that is listing resources as well AND which support dGPU and iGPU resource listing? - If we remove
nvidia-container-cli, we no longer need to pass a PCIe address parameter when adding a GPU device since the detection logic is handled by an NVIDIA lib and not LXD: what are the implications in term of API breaking changes for the users? Shall we keep this device parameter and 'resolve' to a CDI identifier? Shall we remove this parameter completely?
@mionaalex this would be a good potential roadmap item
Once implemented, this should allow (or make it easier) to drop the nvidia-container part of the LXD snap.