cuCollections static_reduction

This is an extension to PR #82 and closes #58

Adds a new class called static_reduction_map.

When inserting a key/value pair, static_reduction_map performs an aggregation operation between the newly inserted payload and the existing value in the map. The slots in the map are initialized such that the identity value of the aggregation is the initial value of a slot's payload.

The following functionality has been added

CUDA stream support
Sync with current dev branch.
Unit tests
Exponential backoff strategy for CAS loop based custom_op functor. [WIP]
Benchmarks for insert bulk operation
Reduce-by-key benchmarks including a comparison against CUB and Thrust.

Reduce-by-key benchmark results

In this benchmark scenario, we generate 100'000'000 uniformly distributed key-value pairs, where each distinct key has a multiplicity of m, i.e. each key occurs on average m times in the input data. The task is to sum up all values associated to the same key, where the input data, as well as the result reside in the GPU's global memory space. Note that for our hash-based implementation (CUCO) we included two measurements with different target load factors (50% and 80%).

NVIDIA Tesla V100 32GB

4+4 byte key/value pairs

rbk_uniform_distribution_i32_v100

8+8 byte key/value pairs

rbk_uniform_distribution_i64_v100

NVIDIA Tesla A100 40GB

4+4 byte key/value pairs

rbk_uniform_distribution_i32_a100

8+8 byte key/value pairs

rbk_uniform_distribution_i64_a100

Aug 04 '21 22:08 sleeepyjack

Can one of the admins verify this patch?

Aug 04 '21 22:08 GPUtester

add to whitelist

Aug 05 '21 02:08 jrhemstad

okay to test

Aug 05 '21 02:08 jrhemstad

ok to test

Aug 09 '21 18:08 dillon-cullinan

ok to test

Aug 09 '21 18:08 raydouglass

now ready for review

Aug 10 '21 00:08 sleeepyjack

@sleeepyjack any chance you'd be able to address the merge conflicts on this PR so we can get it merged?

Jan 27 '22 18:01 jrhemstad

Quick update: I managed to resolve the merge conflicts and, in the process, refactored parts of the benchmark suite. I'll re-run all of the benchmarks tonight to make sure they deliver the same results. If this is the case, I'll push the changes so we can merge this PR into dev.

Jan 31 '22 17:01 sleeepyjack

Thanks for the great work! It's a large PR and I just had a quick look over examples, tests and benchmarks. Will look into implementations shortly.

Thanks so much for the review so far! And I have to apologize for the unnecessary large merge commit. I just wanted it done as quickly as possible so you guys don't have to wait for it to get merged. I will incorporate the requested changes in the next couple of days.

Mar 10 '22 01:03 sleeepyjack

@sleeepyjack to work on breaking this up into smaller PRs to make it easier to review.

May 19 '22 15:05 jrhemstad

Superseeded by #515

Jul 08 '24 22:07 sleeepyjack