cuda-api-wrappers
cuda-api-wrappers copied to clipboard
Thin, unified, C++-flavored wrappers for the CUDA APIs
**[NVIDIA/gdrcopy](https://github.com/NVIDIA/gdrcopy)**: > **A low-latency GPU memory copy library based on NVIDIA GPUDirect RDMA technology. Introduction** > >While GPUDirect RDMA is meant for direct access to GPU memory from third-party devices,...
As we all know (or should know), the C++ standard's smart pointers class suck. Why? Because ownership should be of _regions_, not pointers, and it's inane to expect allocators or...
A lot of the wrapper code is located by now within `detail::` sub-namespaces, interspersed among the actual, intended-for-use, functions. Additionally, a lot of the implementations of non-`detail::` functions are already...
The Driver-API-based branch already obtains kernel properties individually. Let's have such methods in `kernel_t` for the runtime API, so that the user does not _have_ to know about what `kernel::attribute_t`...
I've been toying with the idea of unifying some functions `allocate()`, `free()` and maybe `make_unique()`, so that instead of spreading them across sub-namespaces, we would just pass the memory type...
The driver-wrappers branch has a problem: Using the Runtime API `cudaError_t` as `cuda::status_t` requires casts from ` CUresult` - and vice versa, as they are both enums. We can't write...
The CUDA driver's 2D and 3D copying API support offsets into the host and/or destination array. At the moment, our wrapper API does not support this. While I doubt this...
CUDA 11.5 [introduced](https://developer.nvidia.com/blog/revealing-new-features-in-the-cuda-11-5-toolkit/) block-compressed types for CUDA arrays: ``` cudaChannelFormatKindUnsignedBlockCompressed1 cudaChannelFormatKindUnsignedBlockCompressed1SRGB ``` we should support these with our array wrapper API.
In CUDA 11.5, new array formats [are introduced](https://developer.nvidia.com/blog/revealing-new-features-in-the-cuda-11-5-toolkit/): ``` cudaChannelFormatKindUnsignedNormalized8X{1|2|4} cudaChannelFormatKindSignedNormalized8X{1|2|4} cudaChannelFormatKindUnsignedNormalized16X{1|2|4} cudaChannelFormatKindSignedNormalized16X{1|2|4} ``` we should support this can be specified by our array wrapper.
One NVIDIA's libraries, which we currently ignore completely, is [NVML - The NVIDIA Management Library](https://developer.nvidia.com/nvidia-management-library-nvml). It allows access to a bunch of meta-data which we currently fully access - neither...