cuda-api-wrappers issues

std::is_trivially_copy_constructible requirement for kernel parameters still too strong

6

Even the most basic "fancy iterator" from Thrust ~~`thrust::constant_iterator`~~ `thrust::counting_iterator` doesn't fulfill the requirement making algorithms written using cuda-api-wrappers for launching kernels less flexible. constant_iterator constructor ```cuda __host__ __device__ constant_iterator(constant_iterator...

pauleonix

question

Add support for nvJitLink linking

NVIDIA, in their infinite wisdom, have decided to kind-of-clone the `nvLink*` driver API functions, into `nvJitLink*` functions, doing basically the same thing but with LTO support. Not sure why they...

eyalroz

Provide API hooking functionality like in libcuhook

The CUDA samples directory has a sampled named cuHook, intended for dynamic loading via LD_PRELOAD, which lets you install pre-hooks and post-hooks for CUDA driver calls. Perhaps we should add...

eyalroz

question

Support empty nodes as typed_node's

At the moment, we don't properly support empty execution graph nodes on our graph_support branch. But those do exist and can be inserted.

eyalroz

task

Use CMake 3.27's fatbin generation mechanism

At the moment, we use a custom command to generate fatbins from compiled objects (e.g. [here](https://github.com/eyalroz/cuda-api-wrappers/blob/master/examples/CMakeLists.txt#L112)). CMake 3.27 has introduced a built-in mechanism for generating them, described [here](https://cmake.org/cmake/help/v3.27/prop_tgt/CUDA_FATBIN_COMPILATION.html#prop_tgt:CUDA_FATBIN_COMPILATION). Let's switch...

eyalroz

task

Getting an unknown CUDA error with cuMemPoolImportFromShareableHandle()

In the `streamOrderedAllocationIPC` example program, we sometimes(/always) get an error when importing a mempool exported for IPC (from a "shareable handle). Unfortunately, it's an "unknown error", which the `cuMemPoolImportFromShareableHandle()` function...

eyalroz

bug

Error setting reserved virtual mem region permissions

1

When I run vectorAddMMAP on a machine with multiple Ampere GPUs which P2P access, I get: ``` terminate called after throwing an instance of 'cuda::runtime_error' what(): Failed setting the access...

eyalroz

bug

Support setting kernel block cluster dimensions

With the Hopper architecture, NVIDIA has introduced "clusters" of blocks which can use each other's shared memory. The clustering can be set either using a `__cluster_dims__(1,2,3)` qualifier in the kernel's...

eyalroz

task

Support execution graph capture via stream_t's

3

CUDA execution graph templates can be created in one of two ways: Explicit construction and capture of operations enqueued via a stream. I'm currently working on the explicit construction API...

eyalroz

task

Support CUDA 12.x texture, surface and tensor "objects"

See [this discussion](https://stackoverflow.com/questions/75498717/whats-the-replacement-for-cumodulegetsurfref-and-cumodulegettexref/75500158#75500158). We should make it possible to use text/surface/tensor objects, and also avoid the old APIs with CUDA 12 and later.

eyalroz

task

cuda-api-wrappers
cuda-api-wrappers copied to clipboard

Metadata

std::is_trivially_copy_constructible requirement for kernel parameters still too strong

Add support for nvJitLink linking

Provide API hooking functionality like in libcuhook

Support empty nodes as typed_node's

Use CMake 3.27's fatbin generation mechanism

Getting an unknown CUDA error with cuMemPoolImportFromShareableHandle()

Error setting reserved virtual mem region permissions

Support setting kernel block cluster dimensions

Support execution graph capture via stream_t's

Support CUDA 12.x texture, surface and tensor "objects"

← Metadata

Owner

Metadata

cuda-api-wrappers cuda-api-wrappers copied to clipboard

Metadata

← Metadata

Owner

Metadata

cuda-api-wrappers
cuda-api-wrappers copied to clipboard