CCCL C library should support CUDA minor version compatibility
After #4845 is fixed we still have a problem in supporting CUDA minor version compatibility (MVC). This time the problem is in the usage of driver APIs.
The driver types (over which many of the runtime types are typedef'd) are stable over minor releases, as per @pciolkosz, but not the driver APIs. For how the public driver APIs are redirected to the underlying symbols that could change in minor releases, consult cuda.h & co. For this purpose, there are driver (cuGetProcAddress) and runtime (cudaGetDriverEntryPoint{ByVersion}) APIs for fetching the correct underlying driver symbol in an MVC-complaint manner.
This bug is more serious because it essentially forces users to have the same driver version at run time as the one used at build time, which most of time is a very stringent limitation.
There are a few solutions:
- Statically link to
cudartand use the runtime counterparts instead. Most of CUDA libraries do this and it's the simplest solution. Starting CUDA 12.0 we should have everything that we need (ex:CUkernel->cudaKernel_t). - Statically link to
cudartand usecudaGetDriverEntryPoint{ByVersion}to fetch the needed driver function pointer. This is what cudax does today. - Auto-generate a shim layer over all needed driver APIs that does
cuGetProcAddressunder the hood. This is whatcuda.bindings.driverdoes today.
We should fix this asap, though I am not sure if this is a must-fix before releasing the first wheels.
@pciolkosz IIRC we discussed and thought that the driver APIs are unlikely to break in minor releases, is it correct? If so we should just close this as a non-issue.
Yes, drivers are unlikely to change in minor releases, but there are other reasons why some of the ideas mentioned above may still be relevant. Before closing this, it'd be good to double-check our expectations against the problems that we observed in https://github.com/NVIDIA/cccl/issues/5970 and the corresponding plans to expand testing in https://github.com/NVIDIA/cccl/issues/5987. (Happy for this to be closed if there is no need to keep this open, or if one issue is sufficient.)
https://github.com/NVIDIA/cccl/issues/5970 was a CUDA runtime compatibility issue, because we used an API introduced in a minor release.
The driver can not change API or ABI in a minor releases at all. This part of the description is not true:
For how the public driver APIs are redirected to the underlying symbols that could change in minor releases, consult cuda.h & co.
If a new version of an interface is introduced it will first start as a cuFoo_vX+1 and later in the next major release cuFoo will start pointing to it. See how cuCtxCreate was changed to cuCtxCreate_v4 in 13.0, even if cuCtxCreate_v4 already existed since CUDA 12.5:
// CUDA 12.9 cuda.h
#define cuCtxCreate cuCtxCreate_v2
// CUDA 13.0 cuda.h
#define cuCtxCreate cuCtxCreate_v4
cuGetProcAddress exist because if you would like to conditionally access cuCtxCreate_v4 or some newly introduced API and do -lcuda, your runtime linker would fail because of missing symbol for libcuda.so before 12.5. So instead of using it directly, you can ask for cuCtxCreate passing 12.5 as the version and get the function pointer back, without invoking the runtime linker.
See for example https://docs.nvidia.com/cuda/cuda-c-programming-guide/#access-new-cuda-features
In CCCL we might sometimes use an API or a version of an API introduced in a minor release, in which case we need to properly handle the path where someone uses older component without that new API. But its more of a case-by case thing, we don't need to do anything for APIs that existed in the first release of a CUDA major version