Georgii Evtushenko
Georgii Evtushenko
I've used `cuda::std` in the recent additions to thrust. This namespace conflicts with thrust cuda namespace in some cases. This PR changes `cuda::std` namespace to `::cuda::std` one.
Cub uses testing facilities that are far from perfect. This PR contains Catch2 integration and a few convenience wrappers. Main advantages of Catch2: - Readable way of specifying cartesian products...
The `cub::DeviceSpmv` was unmaintained for a while and probably contains [bugs](https://github.com/NVIDIA/cub/pull/352#discussion_r680580812). Moreover, there are better implementations in specialized libraries like cuSPARSE. I suggest we deprecate it.
This PR briefly explains the current CUB design. The document is intended to help contributors. Coming PTX dispatch changes will lead to changes in this document. Having a diff of...
Currently, `cub::DeviceSegmentedSort` has a fallback kernel, that [apply](https://github.com/NVIDIA/cub/blob/0430cc0bfcb7c2496b42da754c215c9b5df8856b/cub/device/dispatch/dispatch_segmented_sort.cuh#L169) different algorithms for different segment sizes. In particular, medium-size segments are sorted by merge sort. If segment doesn't fit into registers, it's...