cuCollections
cuCollections copied to clipboard
[ENHANCEMENT]: Place the existing key to the right-hand side during equality checks
Is your feature request related to a problem? Please describe.
cuco hash tables always place the slot key on the left-hand side for key equality checks: https://github.com/NVIDIA/cuCollections/blob/6cb6dbfe13b10109f74f3b5bedbe38f8c0eed687/include/cuco/static_map.cuh#L64-L66
This was a completely random choice when I started the open-addressing refactoring and I thought it didn't matter and was wrong.
Generally speaking, when we want to check if two variables are identical, we put the query value on the left-hand side and the "reference" or the existing value on the right-hand side. e.g. we do
if (idx == 0) { ... }
instead of
if (0 == idx) { ... }
The new cuco data structures are unfortunately following the latter pattern.
This works fine until we meet the hash join use case where the build table is the right table and the probe table is the left table. As a result, the left table is always on the right when doing comparisons in cuco while the right table is always on the left. In many places across libcudf, build_table
/right_table
and probe_table
/left_table
are interchangeable terms thus for a function join_func
expecting the first argument to be the build table and the second argument to be the probe table, we may have to invoke it awkwardly:
void join_func(right_table, left_table, ...)
This must be stopped.
Describe the solution you'd like
Always place the existing value (either the sentinel value or the slot key) on the right side for equality checks.
e.g. https://github.com/NVIDIA/cuCollections/blob/6cb6dbfe13b10109f74f3b5bedbe38f8c0eed687/include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh#L370
We should do the following instead
this->predicate_.operator()<is_insert::YES>(key, this->extract_key(slot_content));