cuda-api-wrappers icon indicating copy to clipboard operation
cuda-api-wrappers copied to clipboard

Thin, unified, C++-flavored wrappers for the CUDA APIs

Results 151 cuda-api-wrappers issues
Sort by recently updated
recently updated
newest added

Hi, I'm getting back with some library incompatibility issue. In my branch https://github.com/ralwing/cuda-api-wrappers/tree/eigen-compat, [commit](https://github.com/ralwing/cuda-api-wrappers/commit/f2a31572292f783a8f4aed45984694ec4d2429a4) and https://github.com/ralwing/cuda-api-wrappers/commit/aed66cede34392df54633c12f68177be3bd9c938 which has some custom changes (e.g by default it compiles as c++17) i test...

enhancement
resolved-on-development

We invoke NVCC with a custom command for compiling kernel fatbin files. But - we on;y use the architecture-related flag(s), rather than all CMAKE_CUDA_FLAGS. Let's use all of them.

task
resolved-on-development
build

When running the `p2pBandwidthLatencyTest`, a modified CUDA sample program, we get: ``` ... snip ... Testing copy mechanism: Kernels... Unidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 2 0 0.00...

Reproducible `example.cpp`: ```cpp #include // from cuda-api-wrappers #include ``` Using CUDA toolkit 12.5 compiled with MSVC on Windows. Error message: ``` [build] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include\cuda/std/__iterator/wrap_iter.h(202): warning C4099: 'cuda::span':...

For debugging purposes, and perhaps even for error reporting, at times, it would make sense to be able to dump the entirety of a copy parameters structure into a string.

enhancement

One of the reasons I sometimes fail to weed out bugs in the library is, that some of its code is templated, and is not instantiated in any of the...

task

In bug: https://developer.nvidia.com/bugs/4874669 we got some extra documentation for nvFatbin function options. Let's put that knowledge to use.

task

Recently, NVIDIA introduced an additional device flag, [cudaDeviceSyncMemops](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#): "Device flag - Ensure synchronous memory operations on this context will synchronize" It sounds silly, but it is indeed meaningful. Let's support...

task

Our compilation options structure supports appending arbitrary extra options. However, those options could potentially conflict, or repeat, options we set through the fields of the compilation_options_t structure; and CUDA's NVRTC...

enhancement

It would be nice if compilation options could be set using a string, as we might see it in the render options, e.g. being able to say something like `options.set("--device-int128")`...

enhancement