cuCollections
cuCollections copied to clipboard
[FEA] Refactor of open address data structures
Is your feature request related to a problem? Please describe.
There is a significant amount of redundancy among the static_map/static_multimap/static_reduction_map
classes. This is a large maintenance overhead and means optimizations made to one data structure do not translate to the others.
Furthermore, there are several configuration options we'd like to enable, like using AoS vs SOA, scalar vs. CG operations, etc.
I'd also like to enable adding a static_set
and static_multiset
classes that could share the same backend.
Describe the solution you'd like
List of things I'd like to address:
- Use atomic_ref when possible (4B/8B key/value types) #183
- Falls back on atomic when necessary (<4B, >8B key/value types)
- For <4B, we should probably just widen them ourselves and still use
atomic_ref
instead ofatomic
.
- For <4B, we should probably just widen them ourselves and still use
- Eliminates redundancy among static_map/reduction_map/multimap
- Enables AoS vs SoA layout (#103)
- Enables statically sized device views
- We should use a pattern like
std::span
withstd::dynamic_extent
to support both dynamic and statically sized capacities.
- We should use a pattern like
- Enables adding
static_set
andstatic_multiset
- Supports the various insert schemes: packed/back2back/cas+dependent write
- Switch between scalar/CG operations
- Stream support everywhere (https://github.com/NVIDIA/cuCollections/issues/65)
- Consistent use of
bitwise_equal
- Asynchronous size computation (#102, #237 )
- rehashing (https://github.com/NVIDIA/cuCollections/issues/21)
My current thinking is to create an open_address_impl
class that provides an abstraction for a logical array of "slots" and exposes operations on those slots. All the core logic and switching for things like AoS/SoA, atomic_ref/atomic can/should be implemented in this common impl class.