cuda-api-wrappers
cuda-api-wrappers copied to clipboard
Thin, unified, C++-flavored wrappers for the CUDA APIs
Hi, I'm getting back with some library incompatibility issue. In my branch https://github.com/ralwing/cuda-api-wrappers/tree/eigen-compat, [commit](https://github.com/ralwing/cuda-api-wrappers/commit/f2a31572292f783a8f4aed45984694ec4d2429a4) and https://github.com/ralwing/cuda-api-wrappers/commit/aed66cede34392df54633c12f68177be3bd9c938 which has some custom changes (e.g by default it compiles as c++17) i test...
We invoke NVCC with a custom command for compiling kernel fatbin files. But - we on;y use the architecture-related flag(s), rather than all CMAKE_CUDA_FLAGS. Let's use all of them.
When running the `p2pBandwidthLatencyTest`, a modified CUDA sample program, we get: ``` ... snip ... Testing copy mechanism: Kernels... Unidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 2 0 0.00...
Reproducible `example.cpp`: ```cpp #include // from cuda-api-wrappers #include ``` Using CUDA toolkit 12.5 compiled with MSVC on Windows. Error message: ``` [build] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include\cuda/std/__iterator/wrap_iter.h(202): warning C4099: 'cuda::span':...
For debugging purposes, and perhaps even for error reporting, at times, it would make sense to be able to dump the entirety of a copy parameters structure into a string.
One of the reasons I sometimes fail to weed out bugs in the library is, that some of its code is templated, and is not instantiated in any of the...
In bug: https://developer.nvidia.com/bugs/4874669 we got some extra documentation for nvFatbin function options. Let's put that knowledge to use.
Recently, NVIDIA introduced an additional device flag, [cudaDeviceSyncMemops](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#): "Device flag - Ensure synchronous memory operations on this context will synchronize" It sounds silly, but it is indeed meaningful. Let's support...
Our compilation options structure supports appending arbitrary extra options. However, those options could potentially conflict, or repeat, options we set through the fields of the compilation_options_t structure; and CUDA's NVRTC...
It would be nice if compilation options could be set using a string, as we might see it in the render options, e.g. being able to say something like `options.set("--device-int128")`...