abacus-develop icon indicating copy to clipboard operation
abacus-develop copied to clipboard

LCAO too slow on GPU, an example from Si with 512 atoms

Open mohanchen opened this issue 5 months ago • 0 comments

Details

Si-512-abacus-gpu-test.tar.gz

The main bottleneck lies in the computing process of the CPU, as most of the computing time is consumed by CPU calculations, while the GPU operates for less than 20% of the total time. Regarding multi-GPU computing, I have correctly compiled the CUDA version of ELPA. However, practical tests show that the parallel acceleration of multi-GPU systems mainly stems from the proportionally increased number of CPU cores allocated. Specifically, scaling up from a configuration of [6 cores paired with 1 V100 SXM2 16GB GPU] to [24 cores paired with 4 V100 SXM2 16GB GPUs] may achieve a 2.5x speedup; even using 24 cores with a single V100 SXM2 16GB GPU can result in a speedup of more than 2x.

mohanchen avatar Jul 12 '25 14:07 mohanchen