conda-forge.github.io icon indicating copy to clipboard operation
conda-forge.github.io copied to clipboard

RFC: A virtual package which detects CUDA compute capability

Open carterbox opened this issue 4 months ago • 3 comments

Today, I would like to announce the development of a conda virtual package which detects the minimum compute capability of CUDA devices on a system. This conda plugin is available here, and needs feedback!

The purpose of this new virtual package is to allow packages to express a minimum required compute capability. This is relevant for some optimized deep learning models which don't target older devices, libraries which drop support for compute capabilities mid-CTK-support-cycle, and for packages that may want to reduce binary sizes by splitting target CUDA architectures across variants.

carterbox avatar Oct 14 '25 18:10 carterbox

Thanks a lot for the work on this! Since I saw https://github.com/wheelnext/nvidia-variant-provider and how it exposed a list of compute capabilities available in the system, I was wondering how that information could also be exposed in a way compatible with conda's virtual packages mechanism, as it was hard to expose a list with virtual packages. The solution of exposing just the minimum compute capability available in the system is simple and works great, and it covers nicely the use case of machines with just a single type of GPU mounted on them, that I may be wrong but I guess are the vast majority of cases.

traversaro avatar Oct 16 '25 09:10 traversaro

The presented solutions relies on loading c shared libraries instead of using nvidia-smi which I understood was the way the __cuda virtual package worked. Is my knowledge outdated?

Is the c shared library expected to be stable over time?

hmaarrfk avatar Oct 17 '25 03:10 hmaarrfk

This is exactly how __cuda is implemented: https://github.com/conda/conda/blob/5ebfc3e7cf4511794fa352d183062e5147d808d8/conda/plugins/virtual_packages/cuda.py#L108 Calling nvidia-smi is slightly worse:

  • under the hood, it still loads libcuda and other DSOs
  • you need a subprocess which is slightly slower
  • you need to parse the output

libcuda is the user-mode driver, i.e. it's part of the CUDA driver.

leofang avatar Oct 17 '25 03:10 leofang