cuCollections
cuCollections copied to clipboard
static_reduction_map
This is an extension to PR #82 and closes #58
Adds a new class called
static_reduction_map.When inserting a key/value pair,
static_reduction_mapperforms an aggregation operation between the newly inserted payload and the existing value in the map. The slots in the map are initialized such that the identity value of the aggregation is the initial value of a slot's payload.
The following functionality has been added
- CUDA stream support
- Sync with current
devbranch. - Unit tests
- Exponential backoff strategy for CAS loop based
custom_opfunctor. [WIP] - Benchmarks for
insertbulk operation - Reduce-by-key benchmarks including a comparison against CUB and Thrust.
Reduce-by-key benchmark results
In this benchmark scenario, we generate 100'000'000 uniformly distributed key-value pairs, where each distinct key has a multiplicity of m, i.e. each key occurs on average m times in the input data. The task is to sum up all values associated to the same key, where the input data, as well as the result reside in the GPU's global memory space. Note that for our hash-based implementation (CUCO) we included two measurements with different target load factors (50% and 80%).
NVIDIA Tesla V100 32GB
4+4 byte key/value pairs

8+8 byte key/value pairs

NVIDIA Tesla A100 40GB
4+4 byte key/value pairs

8+8 byte key/value pairs

Can one of the admins verify this patch?
add to whitelist
okay to test
ok to test
ok to test
now ready for review
@sleeepyjack any chance you'd be able to address the merge conflicts on this PR so we can get it merged?
Quick update: I managed to resolve the merge conflicts and, in the process, refactored parts of the benchmark suite. I'll re-run all of the benchmarks tonight to make sure they deliver the same results. If this is the case, I'll push the changes so we can merge this PR into dev.
Thanks for the great work! It's a large PR and I just had a quick look over examples, tests and benchmarks. Will look into implementations shortly.
Thanks so much for the review so far! And I have to apologize for the unnecessary large merge commit. I just wanted it done as quickly as possible so you guys don't have to wait for it to get merged. I will incorporate the requested changes in the next couple of days.
@sleeepyjack to work on breaking this up into smaller PRs to make it easier to review.
Superseeded by #515