Yunsong Wang issues

Results 44 issues of


                                            Yunsong Wang

Clean up join benchmarks

## Description This PR cleans up the join benchmark implementations. It uses nvbench helpers to simplify the code and reduces the number of test cases. ## Checklist - [x] I...

3 - Ready for Review

libcudf

improvement

non-breaking

Improve distinct join with set `retrieve`

## Description This PR updates the distinct join to use `static_set::retrieve` instead of the custom device code. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [x]...

3 - Ready for Review

libcudf

improvement

non-breaking

[ENHANCEMENT]: Place the existing key to the right-hand side during equality checks

### Is your feature request related to a problem? Please describe. cuco hash tables always place the slot key on the left-hand side for key equality checks: https://github.com/NVIDIA/cuCollections/blob/6cb6dbfe13b10109f74f3b5bedbe38f8c0eed687/include/cuco/static_map.cuh#L64-L66 This was...

good first issue

P1: Should have

type: improvement

[FEATURE]: Add multiset host-bulk retrieve APIs

### Is your feature request related to a problem? Please describe. Add multiset host-bulk retrieve APIs ### Describe the solution you'd like The basic API to add: ```cuda /** *...

type: feature request

helps: rapids

In Progress

topic: static_multiset

Use invoke_one when possible

This PR updates new open addressing implementations to use `cg::invoke_one` when possible. It doesn't change legacy implementations like multimap or dynamic map, etc.

type: improvement

Add multiset contains APIs

Closes #463 This PR adds multiset contains and its variants. Host-bulk conditional `contains` is also supported.

type: feature request

Needs Review

topic: static_multiset

Add multiset find APIs

Closes #464 This PR adds multiset host-bulk and device-singular find APIs

type: feature request

Needs Review

topic: static_multiset

Add multiset count APIs

TBD

type: feature request

In Progress

topic: static_multiset

[ENHANCEMENT]: Get rid of of custom atomic operations once CCCL 2.4 is ready

### Is your feature request related to a problem? Please describe. The current cuco implementations use custom atomic functions, e.g. https://github.com/NVIDIA/cuCollections/blob/1c8b92074d9a0d07ff9288626c22ab4f5fb9d6ad/include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh#L904-L936 due to a performance regression with `cuda::atomic_ref` (https://github.com/NVIDIA/cccl/issues/1008). With...

good first issue

P1: Should have

type: improvement

[BUG]: Performance regression with shared memory query operations

### Is this a duplicate? - [X] I confirmed there appear to be no duplicate issues for this bug (https://github.com/NVIDIA/cuCollections/issues) ### Type of Bug Performance ### Describe the bug When...

type: bug

topic: performance

P3: Backlog