Eyal Rozenberg
Eyal Rozenberg
Some of our CUDA entity wrapper classes, like `stream_t`, "swallow" exceptions on destruction, and do not throw on failure of their underlying destroy functions (e.g. `cuStreamDestroy()`; but - this is...
We have recently made sure none of our destructors throws (unless a preprocessor definition is set): #728 . However - that is actually a bit too strict of a policy,...
Let's move preprocessor-focused and compiler-compatibility code out of `src/cuda/api/detail/type_traits.hpp` into a separate header.
The destructor code for `memory::pool::ipc::imported_ptr_t` has a bug: If the object is not owning, the destructor will still try to free the pointer (albeit not using a stream, just with...
We invoke NVCC with a custom command for compiling kernel fatbin files. But - we on;y use the architecture-related flag(s), rather than all CMAKE_CUDA_FLAGS. Let's use all of them.
When running the `p2pBandwidthLatencyTest`, a modified CUDA sample program, we get: ``` ... snip ... Testing copy mechanism: Kernels... Unidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 2 0 0.00...
For debugging purposes, and perhaps even for error reporting, at times, it would make sense to be able to dump the entirety of a copy parameters structure into a string.
One of the reasons I sometimes fail to weed out bugs in the library is, that some of its code is templated, and is not instantiated in any of the...
In bug: https://developer.nvidia.com/bugs/4874669 we got some extra documentation for nvFatbin function options. Let's put that knowledge to use.
Recently, NVIDIA introduced an additional device flag, [cudaDeviceSyncMemops](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#): "Device flag - Ensure synchronous memory operations on this context will synchronize" It sounds silly, but it is indeed meaningful. Let's support...