cuda-api-wrappers issues

Should we include support for GPUDirect/RDMA as in NVIDIA's gdrcopy?

**[NVIDIA/gdrcopy](https://github.com/NVIDIA/gdrcopy)**: > **A low-latency GPU memory copy library based on NVIDIA GPUDirect RDMA technology. Introduction** > >While GPUDirect RDMA is meant for direct access to GPU memory from third-party devices,...

eyalroz

question

Add a `unique_region_t` class or some equivalent mechanism

1

As we all know (or should know), the C++ standard's smart pointers class suck. Why? Because ownership should be of _regions_, not pointers, and it's inane to expect allocators or...

eyalroz

question

task

Should we move `detail_::` code and/or implementations to separate files?

A lot of the wrapper code is located by now within `detail::` sub-namespaces, interspersed among the actual, intended-for-use, functions. Additionally, a lot of the implementations of non-`detail::` functions are already...

eyalroz

question

Add methods for getting individual kernel attributes - to the runtime-API-only branch

The Driver-API-based branch already obtains kernel properties individually. Let's have such methods in `kernel_t` for the runtime API, so that the user does not _have_ to know about what `kernel::attribute_t`...

eyalroz

task

Should we replace memory::XXXXX::allocate(...) with memory::allocate(cuda::memory::type_t, ...) ?

I've been toying with the idea of unifying some functions `allocate()`, `free()` and maybe `make_unique()`, so that instead of spreading them across sub-namespaces, we would just pass the memory type...

eyalroz

question

One status_t to rule them all

The driver-wrappers branch has a problem: Using the Runtime API `cudaError_t` as `cuda::status_t` requires casts from ` CUresult` - and vice versa, as they are both enums. We can't write...

eyalroz

task

Support offsets in 2D, 3D copying

The CUDA driver's 2D and 3D copying API support offsets into the host and/or destination array. At the moment, our wrapper API does not support this. While I doubt this...

eyalroz

task

Support CUDA 11.5 block-compressed array types

CUDA 11.5 [introduced](https://developer.nvidia.com/blog/revealing-new-features-in-the-cuda-11-5-toolkit/) block-compressed types for CUDA arrays: ``` cudaChannelFormatKindUnsignedBlockCompressed1 cudaChannelFormatKindUnsignedBlockCompressed1SRGB ``` we should support these with our array wrapper API.

eyalroz

task

Support CUDA 11.5 new arrays formats

In CUDA 11.5, new array formats [are introduced](https://developer.nvidia.com/blog/revealing-new-features-in-the-cuda-11-5-toolkit/): ``` cudaChannelFormatKindUnsignedNormalized8X{1|2|4} cudaChannelFormatKindSignedNormalized8X{1|2|4} cudaChannelFormatKindUnsignedNormalized16X{1|2|4} cudaChannelFormatKindSignedNormalized16X{1|2|4} ``` we should support this can be specified by our array wrapper.

eyalroz

task

Add support for NVML functionality

One NVIDIA's libraries, which we currently ignore completely, is [NVML - The NVIDIA Management Library](https://developer.nvidia.com/nvidia-management-library-nvml). It allows access to a bunch of meta-data which we currently fully access - neither...

eyalroz

enhancement

cuda-api-wrappers
cuda-api-wrappers copied to clipboard

Metadata

Should we include support for GPUDirect/RDMA as in NVIDIA's gdrcopy?

Add a `unique_region_t` class or some equivalent mechanism

Should we move `detail_::` code and/or implementations to separate files?

Add methods for getting individual kernel attributes - to the runtime-API-only branch

Should we replace memory::XXXXX::allocate(...) with memory::allocate(cuda::memory::type_t, ...) ?

One status_t to rule them all

Support offsets in 2D, 3D copying

Support CUDA 11.5 block-compressed array types

Support CUDA 11.5 new arrays formats

Add support for NVML functionality

← Metadata

Owner

Metadata

cuda-api-wrappers cuda-api-wrappers copied to clipboard

Metadata

← Metadata

Owner

Metadata

cuda-api-wrappers
cuda-api-wrappers copied to clipboard