abacus-develop Does anyone know how much a GPU can accelerate in comparison to multicore CPU?

Details

Does anyone know how much a GPU can accelerate in comparison to multicore CPU? Since the CPU cores for GPU nodes are limited, it worthwhile to apply a GPU node to run abacus or apply for multiple CPU nodes and use mpi? Does anyone have the comparison results? Additionally, does abacus use fp64 in GPU(CUDA) mode? or fp32

Have you read FAQ on the online manual http://abacus.deepmodeling.com/en/latest/community/faq.html

[x] Yes, I have read the FAQ part on online manual.

Task list for Issue attackers (only for developers)

[ ] Understand the problem or question described by the user.
[ ] Check if the issue is a known problem or has been addressed in the documentation.
[ ] Test the issue or problem on a similar system or environment, if possible.
[ ] Identify the root cause or provide clarification on the user's question.
[ ] Provide a step-by-step guide, including any necessary resources, to resolve the issue or answer the question.
[ ] If the issue is related to documentation, update the documentation to prevent future confusion (optional).
[ ] If the issue is related to code, consider implementing a fix or improvement (optional).
[ ] Review and incorporate any relevant feedback from users or developers.
[ ] Ensure the user's issue is resolved or their question is answered and close the ticket.

Apr 01 '25 11:04 OutisLi

Do you want results for PW basis or LCAO basis?

Apr 01 '25 12:04 mohanchen

LCAO perhaps？Thanks a lot

On Tue, Apr 1, 2025 at 20:27 Mohan Chen @.***> wrote:

Do you want results for PW basis or LCAO basis?

— Reply to this email directly, view it on GitHub https://github.com/deepmodeling/abacus-develop/issues/6091#issuecomment-2769194564, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAY2QTP26QY7MOGWUYFKC632XKA4LAVCNFSM6AAAAAB2GUJENKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRZGE4TINJWGQ . You are receiving this because you authored the thread.Message ID: @.***> [image: mohanchen]mohanchen left a comment (deepmodeling/abacus-develop#6091) https://github.com/deepmodeling/abacus-develop/issues/6091#issuecomment-2769194564

Do you want results for PW basis or LCAO basis?

— Reply to this email directly, view it on GitHub https://github.com/deepmodeling/abacus-develop/issues/6091#issuecomment-2769194564, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAY2QTP26QY7MOGWUYFKC632XKA4LAVCNFSM6AAAAAB2GUJENKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRZGE4TINJWGQ . You are receiving this because you authored the thread.Message ID: @.***>

Apr 01 '25 13:04 OutisLi

An example for reference: I performed SCF calculations for Mo (Molybdenum) with LCAO basis on both CPU and GPU. The CPU calculation took 610 seconds, while the GPU calculation took 131 seconds, showing a significant acceleration effect.

Setup:

CPU: 8 cores Intel(R) Xeon(R) Platinum 8462Y+ ; I used single process with 8 threads.
GPU: one A800 GPU along with 8 CPU cores ; also single process with 8 threads.
System: 32 Mo atoms in body-centered cubic structure
CPU version: ABACUS v3.7.3; GPU version : ABACUS v3.9.0.2

The input and output files are attached.

cpu_8.tar.gz

32bcc_gpu_thread_8.tar.gz

Apr 01 '25 13:04 cyxpku

@OutisLi Do you think the answer is sufficient?

Apr 02 '25 03:04 mohanchen

@OutisLi Do you think the answer is sufficient?

Thanks a lot. I tested the demo and my result is shown below: PC 1: AMD 7950X3D with 24G RAM + NVIDIA 5090D:

the command is OMP_NUM_THREADS=1 mpirun -n 16
CPU: 60s per loop
GPU: 30s per loop

PC 2: Intel Xeon Gold 6430 * 2 (ignore the gpu)

for mpirun -n 64, the time is 60s per loop
for -n 32, is 30s
for -n 16, is 35s
with very cheap gpu(4G VRAM): cannot compute ( the speed is too slow, so I shut it down)

I'd like to know whether it is using fp32 or fp64, since for gaming cards, fp64 performance is very poor

Apr 02 '25 11:04 OutisLi

Besides, I install abacus through conda, my installation commands for intel+nvidia and amd+nvidia are as shown below respectively:

INTEL + NVIDIA

conda update -n base -c defaults conda -y

conda create -n abacus_conda "abacus=3.9.*=cuda*mpich*" "libblas=*=*mkl" "cuda-version=12*=*" mpich python=3.12 ipykernel -c conda-forge -y

conda activate abacus_conda && conda install cuda-cudart cuda-version=12 -y

AMD + NVIDIA

conda update -n base -c defaults conda -y

conda create -n abacus_conda "abacus=3.9.*=cuda*mpich*" "libblas=*=*openblas" "cuda-version=12*=*" mpich python=3.12 ipykernel -c conda-forge -y

conda activate abacus_conda && conda install cuda-cudart cuda-version=12 -y

My cuda version is 12.8

Apr 02 '25 11:04 OutisLi

Besides, I install abacus through conda, my installation commands for intel+nvidia and amd+nvidia are as shown below respectively:

INTEL + NVIDIA

conda update -n base -c defaults conda -y

conda create -n abacus_conda "abacus=3.9.=cudampich*" "libblas==mkl" "cuda-version=12=" mpich python=3.12 ipykernel -c conda-forge -y

conda activate abacus_conda && conda install cuda-cudart cuda-version=12 -y

AMD + NVIDIA

conda update -n base -c defaults conda -y

conda create -n abacus_conda "abacus=3.9.=cudampich*" "libblas==openblas" "cuda-version=12=" mpich python=3.12 ipykernel -c conda-forge -y

conda activate abacus_conda && conda install cuda-cudart cuda-version=12 -y My cuda version is 12.8

You can try toolchain installation if you are convient.

Apr 02 '25 11:04 QuantumMisaka

@OutisLi Do you think the answer is sufficient?

Thanks a lot. I tested the demo and my result is shown below: PC 1: AMD 7950X3D with 24G RAM + NVIDIA 5090D:

the command is OMP_NUM_THREADS=1 mpirun -n 16

CPU: 60s per loop

GPU: 30s per loop

PC 2: Intel Xeon Gold 6430 * 2 (ignore the gpu)

for mpirun -n 64, the time is 60s per loop

for -n 32, is 30s

for -n 16, is 35s

with very cheap gpu(4G VRAM): cannot compute ( the speed is too slow, so I shut it down)

I'd like to know whether it is using fp32 or fp64, since for gaming cards, fp64 performance is very poor

most calculations are based on FP64

Apr 12 '25 07:04 mohanchen

It is a valuable discussion, I will close it now.

Apr 24 '25 06:04 mohanchen