[FEA] Allow to customize cooperative group size (`tile_size`) for `static_map`
Currently, static_map internally sets a fixed number tile_size = 4. Such tile_size value is used when calling the insert or contains APIs. The value tile_size = 4 is not an optimal one, and may cause performance regression on some (if not most) systems as I have tested myself. For example, setting tile_size = 2 would double the performance when running on my system.
It would be great if we can have a way to specify tile_size upon constructing the static_map object, similar to when we construct a static_multimap.
Probably a candidate for #110. We could also explore having dynamic CG sizes, e.g., CG=1 for when the table occupancy is low and then use a wider group once the table fills up.
@ttnghia I'm curious, what architecture did you run your benchmarks on?
I'm running on RTX Quadro 6000, SM75.
@sleeepyjack I'm guessing it's a difference of GDDR vs HBM. Larger tile_size is better on HBM vs GDDR.
@jrhemstad I was thinking the same thing. A long time ago, I dreamed about having a compile time lookup table for choosing the optimal (default) CG size for a given architecture in WarpCore. Sounds wild, but hey, why not?
@jrhemstad I was thinking the same thing. A long time ago, I dreamed about having a compile time lookup table for choosing the optimal (default) CG size for a given architecture in WarpCore. Sounds wild, but hey, why not?
That wouldn't be too hard. It would be similar to how CUB does its device specific tuning policies.