Daniel Jünger
Daniel Jünger
Probably a candidate for #110. We could also explore having dynamic CG sizes, e.g., `CG=1` for when the table occupancy is low and then use a wider group once the...
@ttnghia I'm curious, what architecture did you run your benchmarks on?
@jrhemstad I was thinking the same thing. A long time ago, I dreamed about having a compile time lookup table for choosing the optimal (default) CG size for a given...
Closed by #515
Benchmark setup: - Tesla V100 SXM2 - Insert 10^8 distinct 4+4 byte key/value pairs - Baseline is the old `insert()`, i.e., with implicit size computation - Comparison against new `insert()`,...
Same but for 8+8 byte key/value pairs: 
I'm not quite sure why the performance of the standalone `size()` is slightly worse compared to 4+4 byte pairs. Maybe my approach of converting a slot into a `thrust::tuple` to...
Is there any documentation on how to use `NV_IF_TARGET`? Can't seem to find it.
> Different flavors of cooperative device-side overloads. E.g., contains We could use different signatures for 1. `contains(CG g, key_type key, ..)` 2. `contains(CG g, KeyIter first, KeyIter last, ..)` The...
> Would the semantics of this function be such that each thread in the group is providing a distinct [first,last) range? Or the same? I don't see the benefit of...