cuda-api-wrappers
cuda-api-wrappers copied to clipboard
Remove more CUDA runtime API uses when using the driver API mostly
While the driver wrappers branch has come a long way, it still uses some CUDA runtime API constants, types, and API calls. Some of this might be unavoidable, but many can still be replaced with no difficulty and others may be replaceable somehow (e.g. cudaSetDevice()?)
So, I've been doing a bunch of work on this, but so far it's all in the form of amending existing commits. Right now, the remaining pieces of runtime code are the following; and I will check them if/when they are removed:
API function calls:
- [x] Peer-to-peer:
cudaDeviceGetP2PAttribute() - [ ] Device management:
cudaSetValidDevices()- currently no driver equivalent - [x] Kernel launching:
cudaLaunchCooperativeKernel() - [x] Version checks:
cudaDriverGetVersion()
Types and data structures:
- [x] Error handling: Use of cudaError_t as
cuda::status_t - [x] Error handling: Use of Runtime API error constants (not critical, could be allowed in seeing how those should be compatible with using CUresult and Driver API constants.)
- [ ] Device management:
cudaDeviceProp
Can't be removed:
- Functions/kernels and their properties:
cudaGetFuncBySymbol()necessary for getting the CUfunction of a__global__Type definition & data structure uses: cudaGetSymbolAddress(),cudaGetSymbolSize()- can't be replaced withcuModuleGetGlobal()since we can't access whatever module is used under the hood by NVCC for baked-in kernels and globals.- Runtime-based error handling is necessary for other runtime-only API function calls; but we've limited this to a runtime-related namespace and functions which aren't used by default.
- We allow conversion between the
dimensions_tclasses andcudaExtentfor convenience.