Daniel Jünger
Daniel Jünger
Having build warnings when `CMAKE_CUDA_ARCHITECTURES=75`: ```bash [ 98%] Linking CXX executable DYNAMIC_MAP_TEST [ 98%] Built target DYNAMIC_MAP_TEST /home/yunsongw/miniconda3/include/rapids/libcudacxx/cuda/std/barrier: In function ‘void cuda::__4::init(cuda::__4::barrier*, ptrdiff_t, cuda::std::__4::__empty_completion)’: /home/yunsongw/miniconda3/include/rapids/libcudacxx/cuda/std/barrier:158:155: warning: unused parameter ‘__completion’ [-Wunused-parameter]...
Currently `__CUDA_MINIMUM_ARCH__` expands to the same value as `__CUDA_ARCH__` on nvcc: https://github.com/NVIDIA/libcudacxx/blob/d553734e66a999727e7b9e6bb19ce7b38024a19f/include/nv/detail/__target_macros#L103 On nvc++ however, the same macro expands to the minimum target architecture provided by the compilation flags: https://github.com/NVIDIA/libcudacxx/blob/d553734e66a999727e7b9e6bb19ce7b38024a19f/include/nv/detail/__target_macros#L76...
Nvbench currently does not support benchmarking CPU-only code natively. Although adding `nvbench::exec_tag::sync` gives plausible measurements for cold runs, there is no mechanism for batch measurements. We could enable this feature...
I stumbled across [this Slack thread](https://nvidia.slack.com/archives/C01Q5NC7NT0/p1658214334637459) recently when I was trying to measure small kernels with nvbench and got fluctuating results. As @senior-zero notes in the thread, the variance of...
Tracking issue for branch feature/static_multiset
### Background We currently require payload types to be bitwise comparable, which is required when we use the "packed CAS" insertion strategy. ### Discussion from #426: > we could consider...
`cuco::stream_ref` is a WAR and should be replaced with `cuda::stream_ref`. `cuda::stream_ref` will be available in libcu++ v2.1.0, and we should get that version soon: rapidsai/rapids-cmake#399. _Originally posted by @sleeepyjack in...
This PR adds a function `static_map::insert_or_apply`, which either inserts a new key-value pair into the map, or, in case the key already exists, applies a reduction function over the associated...
This PR fixes some issues in the helper function for converting a hash value into a valid size. Hash values are always unsigned, but the size type is set by...