cuCollections icon indicating copy to clipboard operation
cuCollections copied to clipboard

[FEA] Allow to customize cooperative group size (`tile_size`) for `static_map`

Open ttnghia opened this issue 3 years ago • 6 comments

Currently, static_map internally sets a fixed number tile_size = 4. Such tile_size value is used when calling the insert or contains APIs. The value tile_size = 4 is not an optimal one, and may cause performance regression on some (if not most) systems as I have tested myself. For example, setting tile_size = 2 would double the performance when running on my system.

It would be great if we can have a way to specify tile_size upon constructing the static_map object, similar to when we construct a static_multimap.

ttnghia avatar Jul 23 '22 02:07 ttnghia

Probably a candidate for #110. We could also explore having dynamic CG sizes, e.g., CG=1 for when the table occupancy is low and then use a wider group once the table fills up.

sleeepyjack avatar Jul 26 '22 19:07 sleeepyjack

@ttnghia I'm curious, what architecture did you run your benchmarks on?

sleeepyjack avatar Jul 26 '22 19:07 sleeepyjack

I'm running on RTX Quadro 6000, SM75.

ttnghia avatar Jul 26 '22 19:07 ttnghia

@sleeepyjack I'm guessing it's a difference of GDDR vs HBM. Larger tile_size is better on HBM vs GDDR.

jrhemstad avatar Jul 27 '22 02:07 jrhemstad

@jrhemstad I was thinking the same thing. A long time ago, I dreamed about having a compile time lookup table for choosing the optimal (default) CG size for a given architecture in WarpCore. Sounds wild, but hey, why not?

sleeepyjack avatar Jul 27 '22 02:07 sleeepyjack

@jrhemstad I was thinking the same thing. A long time ago, I dreamed about having a compile time lookup table for choosing the optimal (default) CG size for a given architecture in WarpCore. Sounds wild, but hey, why not?

That wouldn't be too hard. It would be similar to how CUB does its device specific tuning policies.

jrhemstad avatar Jul 27 '22 12:07 jrhemstad