cuda-api-wrappers icon indicating copy to clipboard operation
cuda-api-wrappers copied to clipboard

Remove more CUDA runtime API uses when using the driver API mostly

Open eyalroz opened this issue 3 years ago • 1 comments

While the driver wrappers branch has come a long way, it still uses some CUDA runtime API constants, types, and API calls. Some of this might be unavoidable, but many can still be replaced with no difficulty and others may be replaceable somehow (e.g. cudaSetDevice()?)

eyalroz avatar Nov 30 '21 07:11 eyalroz

So, I've been doing a bunch of work on this, but so far it's all in the form of amending existing commits. Right now, the remaining pieces of runtime code are the following; and I will check them if/when they are removed:

API function calls:

  • [x] Peer-to-peer: cudaDeviceGetP2PAttribute()
  • [ ] Device management: cudaSetValidDevices() - currently no driver equivalent
  • [x] Kernel launching: cudaLaunchCooperativeKernel()
  • [x] Version checks: cudaDriverGetVersion()

Types and data structures:

  • [x] Error handling: Use of cudaError_t as cuda::status_t
  • [x] Error handling: Use of Runtime API error constants (not critical, could be allowed in seeing how those should be compatible with using CUresult and Driver API constants.)
  • [ ] Device management: cudaDeviceProp

Can't be removed:

  • Functions/kernels and their properties: cudaGetFuncBySymbol() necessary for getting the CUfunction of a __global__ Type definition & data structure uses:
  • cudaGetSymbolAddress(), cudaGetSymbolSize() - can't be replaced with cuModuleGetGlobal() since we can't access whatever module is used under the hood by NVCC for baked-in kernels and globals.
  • Runtime-based error handling is necessary for other runtime-only API function calls; but we've limited this to a runtime-related namespace and functions which aren't used by default.
  • We allow conversion between the dimensions_t classes and cudaExtent for convenience.

eyalroz avatar Dec 06 '21 20:12 eyalroz