RFC: A virtual package which detects CUDA compute capability
Today, I would like to announce the development of a conda virtual package which detects the minimum compute capability of CUDA devices on a system. This conda plugin is available here, and needs feedback!
The purpose of this new virtual package is to allow packages to express a minimum required compute capability. This is relevant for some optimized deep learning models which don't target older devices, libraries which drop support for compute capabilities mid-CTK-support-cycle, and for packages that may want to reduce binary sizes by splitting target CUDA architectures across variants.
Thanks a lot for the work on this! Since I saw https://github.com/wheelnext/nvidia-variant-provider and how it exposed a list of compute capabilities available in the system, I was wondering how that information could also be exposed in a way compatible with conda's virtual packages mechanism, as it was hard to expose a list with virtual packages. The solution of exposing just the minimum compute capability available in the system is simple and works great, and it covers nicely the use case of machines with just a single type of GPU mounted on them, that I may be wrong but I guess are the vast majority of cases.
The presented solutions relies on loading c shared libraries instead of using nvidia-smi which I understood was the way the __cuda virtual package worked. Is my knowledge outdated?
Is the c shared library expected to be stable over time?
This is exactly how __cuda is implemented:
https://github.com/conda/conda/blob/5ebfc3e7cf4511794fa352d183062e5147d808d8/conda/plugins/virtual_packages/cuda.py#L108
Calling nvidia-smi is slightly worse:
- under the hood, it still loads
libcudaand other DSOs - you need a subprocess which is slightly slower
- you need to parse the output
libcuda is the user-mode driver, i.e. it's part of the CUDA driver.