cccl
cccl copied to clipboard
[FEA]: Migrate cuCollections data structures to CCCL
Is this a duplicate?
- [x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
CUDA Experimental (cudax)
Is your feature request related to a problem? Please describe.
https://github.com/NVIDIA/cuCollections provides concurrent data structures for GPUs, including hash tables, Bloom filters, and HyperLogLog, etc. These components would be a better fit within the CCCL library.
Describe the solution you'd like
Migrating the entire cuCollections repository in one step would be too bulky. Instead, we could split the migration into independent parts:
- Utilities like hash functions and
fast_int - Bloom filter
- HyperLogLog
For hash tables, cuCollections primarily supports four fixed-size variants that aim to be as standard-conforming as possible:
- static_set
- static_map
- static_multiset
- static_multimap
All these static hash tables share a common open addressing base class. Additionally, cuCollections includes a dynamic_map, which extends static_map by allowing resizing when a certain occupancy threshold is reached. Since this data structure has only one known user, we can defer discussions on its migration until later.
Proposed Action Items
- [ ] Migrate hash functions to CCCL: murmurhash, xxhash, (identity hash ?)
- [ ] Migrate other utilities like
fast_int,pair,extent,storageto CCCL - [ ] Migrate Bloom filter to CCCL
- [ ] Migrate HyperLogLog to CCCL
- [ ] Migrate hash tables
- [ ] Open addressing base class &
static_set - [ ]
static_map - [ ]
static_multiset - [ ]
static_multimap - [ ] Evaluate the need to migrate
dynamic_map
- [ ] Open addressing base class &
Additional context
- Namespace: For now, all cuCollections components should be placed under the experimental
cudaxnamespace. - C++ Standard Compatibility: cuCollections is currently a C++17 library. If compatibility with C++14 or newer standards (like C++20) is desired, additional work will be required.
- Clang Compatibility: It is unclear whether the current cuCollections implementation is fully compatible with Clang.
From my point of view we should first consider where cuCollections should live.
Given the general plan to unify CCCL more I would suggest into moving it libcu++. That is especially true for more or less foundational features like hashes. I would need to look into the actual code to see whether we can just drop pair and use cuda::std::pair.
Regarding the containers, the question is whether we want to reuse memory resources from cudax or just drop them in as is. If we just want to use them unmodified I would also say <cuda/> would be the right place
@miscco Thanks for the comments! I'm open to whatever the CCCL team thinks is best. I'm just providing additional context to help facilitate the decision-making process.
For reference, cuco data structures aim to be as standard-conforming as possible while also offering better flexibility for performance tuning and GPU optimization. As a result, they are not direct drop-in replacements for their std:: equivalents.
Regarding some of the points you raised:
- Hash Functions: Hashing is a fundamental feature, and cuco provides 32-bit, 64-bit, and 128-bit hashers along with their variants, such as
xxhash_32andmurmurhash3_x86_32. In contrast, STL hash functions are limited to 64-bit. - Pair Structure: I'm not sure about the current state of
cuda::std::pair, but historically, the key difference betweencuco::pairandthrust::pairwas thatcuco::pairis always aligned. This alignment is necessary for atomic operations during hash table insertion and queries. - Allocator: The current cuco allocator is a simple wrapper around
cudaMallocandcudaFree. Using cudax memory resource would likely be a more suitable approach.
A few comments:
- please note that
std::pair/cuda::std::pairis not even trivially copyable. @miscco maybe we can think about acuda::pairversion focused on performance. - "
fast_int" is also tracked here https://github.com/NVIDIA/cccl/issues/3442 - Is
extentequivalent tostd::extent? - I agree with Michael about hash functions. I would expect simple APIs that can be broadly accepted and used across the ecosystem.
Is extent equivalent to std::extent?
No, cuco::extent is a 1D size wrapper for both dynamic and statically sized capacities. e.g. https://godbolt.org/z/v34nsbW4c
- please note that
std::pair/cuda::std::pairis not even trivially copyable. @miscco maybe we can think about acuda::pairversion focused on performance.
cuda::std::pair is trivially copyable alignment is an issue
I would suggest we start with our cuco hashers and find a place that feels right for them in CCCL.
We have a whole zoo of hashers here.
If we want to put them under the cuda:: namespace, I suggest to do the following to not flood the namespace with a bunch of names:
namespace cuda {
template <class T, hash_strategy Strategy = hash_strategy::WHATEVER_STL_IS_USING>
hash {…};
}
and then let hash_strategy be an enum class type that enumerates the hashers we have in our zoo.
This would also make the interface quite close to the std::hash counterpart.
This would be the easy route. However, if we want to make the strategy to be customizable by the user (i.e. passing their own implementation), then we probably need some additional concept magic.
What do you think?