quda
quda copied to clipboard
Autotune for peer-to-peer connectivity
On systems such as Summit, while there is peer-to-peer access across the node, the performance between the hemispheres is significantly less than within the hemisphere. And moreover, improved throughput can be obtained when the data is routed through CPU memory.
When running on such systems, the peer-to-peer connectivity matrix gives a higher performance rating within the hemisphere than between hemispheres. With this in mind, we should add an autotuning option to decide whether it is best to use peer-to-peer across hemispheres or to stage through CPU memory.
How much of that was added in #1024?
#1024 added this as a manual option for the user essentially, and not a run-time autotuning feature which is what this issue is for.