dbcsr icon indicating copy to clipboard operation
dbcsr copied to clipboard

Discussion on tuning machinery

Open alazzaro opened this issue 1 year ago • 2 comments

Follow up of https://github.com/cp2k/dbcsr/pull/804

alazzaro avatar Jun 14 '24 10:06 alazzaro

  • The performance gain with the tuned A100 kernels is minor compared to using the P100 kernels like the tuned P100 kernels work reasonably well for V100.

  • It is better to use the full set of autotuned and predicted kernels from the previous GPU generation than to use only a relative small set of autotuned kernels.

From the comments above (see https://github.com/cp2k/dbcsr/pull/804#issuecomment-2167716134) looks like

Then, the strategy will be to rename the file/parameters in "AMD" and "NVIDIA" and drop the specific GPU version. As I said, I will add a generic kernel which will be good enough for all cases we don't cover with autotuning.

is a good compromise to move forward. But I'm no expert on this, so it's good to hear what people think about this issue.

RMeli avatar Jun 14 '24 12:06 RMeli

Then, the strategy will be to rename the file/parameters in "AMD" and "NVIDIA" and drop the specific GPU version.

Good idea. In particular since a specific tuning may also need maintenance given the underlying runtime version can change over time (aka new CUDA version). Also, this opens a reasonable option to tune/refresh for the latest/deployed GPU (and to naturally phase-out some tuning for older GPUs, not saying it would not run anymore).

hfp avatar Jun 14 '24 13:06 hfp