cuda-api-wrappers
cuda-api-wrappers copied to clipboard
Thin, unified, C++-flavored wrappers for the CUDA APIs
There's a new version of cuMemAdvise, named [cuMemAdvise_v2](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__UNIFIED.html#group__CUDA__UNIFIED_1g1ee1ef51ac4f44b4cf5d54711f227584), available since CUDA v12.2 . Let's support it. Plus, it will supplant cuMemAdvise with CUDA 13 - so let's prepare for that....
In recent CUDA versions, additional context creation flags have been introduced: * CUDA 12.1: CU_CTX_COREDUMP_ENABLE - Trigger coredumps from exceptions in this context * CUDA 12.1: CU_CTX_USER_COREDUMP_ENABLE - Enable user...
CUDA 12.3 introduced conditional graph nodes, and these were further developed in CUDA 12.4; let's support them on the branch with graph support.
CUDA 12.4 adds support for the following flags: * `-fdevice-syntax-only` that ends device compilation after front-end syntax checking. This option can provide rapid feedback (warnings and errors) of source code...
CUDA 12.4 introduces ["Green Contexts"](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__GREEN__CONTEXTS.html), lightweight contexts which also relate to GPU partitioning. Let's try to support these.
The appropriate condition to check when memcpy'ing typed data is whether or not it is trivially copyable - not just trivially copy-ct'able. But we seem to be checking for the...
With recent CUDA versions, we have the CUDA_MEMCPY3D_PEER struct, which is quite flexible. We also have a large host of copy functions - 40 all told - almost all of...
The default value for `-pic` compilation is different with CUDA 12.4 when compiling in whole-program mode. Let's account for this (e.g. by making sure we only use an optional, so...
CUDA 12 discontinued support for all Kepler cards; and those were the last ones which offered configurable shared memory bank size. We may therefore want to stop exposing this functionality...
Beginning with CUDA 11.4, a new API call for creating contexts has become available: ``` CUresult cuCtxCreate_v3 ( CUcontext* pctx, CUexecAffinityParam* paramsArray, int numParams, unsigned int flags, CUdevice dev )...