Yunsong Wang
Yunsong Wang
## Description This PR cleans up the join benchmark implementations. It uses nvbench helpers to simplify the code and reduces the number of test cases. ## Checklist - [x] I...
## Description This PR updates the distinct join to use `static_set::retrieve` instead of the custom device code. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [x]...
### Is your feature request related to a problem? Please describe. cuco hash tables always place the slot key on the left-hand side for key equality checks: https://github.com/NVIDIA/cuCollections/blob/6cb6dbfe13b10109f74f3b5bedbe38f8c0eed687/include/cuco/static_map.cuh#L64-L66 This was...
### Is your feature request related to a problem? Please describe. Add multiset host-bulk retrieve APIs ### Describe the solution you'd like The basic API to add: ```cuda /** *...
This PR updates new open addressing implementations to use `cg::invoke_one` when possible. It doesn't change legacy implementations like multimap or dynamic map, etc.
Closes #463 This PR adds multiset contains and its variants. Host-bulk conditional `contains` is also supported.
Closes #464 This PR adds multiset host-bulk and device-singular find APIs
### Is your feature request related to a problem? Please describe. The current cuco implementations use custom atomic functions, e.g. https://github.com/NVIDIA/cuCollections/blob/1c8b92074d9a0d07ff9288626c22ab4f5fb9d6ad/include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh#L904-L936 due to a performance regression with `cuda::atomic_ref` (https://github.com/NVIDIA/cccl/issues/1008). With...
### Is this a duplicate? - [X] I confirmed there appear to be no duplicate issues for this bug (https://github.com/NVIDIA/cuCollections/issues) ### Type of Bug Performance ### Describe the bug When...