Does anyone know how much a GPU can accelerate in comparison to multicore CPU?
Details
Does anyone know how much a GPU can accelerate in comparison to multicore CPU? Since the CPU cores for GPU nodes are limited, it worthwhile to apply a GPU node to run abacus or apply for multiple CPU nodes and use mpi? Does anyone have the comparison results? Additionally, does abacus use fp64 in GPU(CUDA) mode? or fp32
Have you read FAQ on the online manual http://abacus.deepmodeling.com/en/latest/community/faq.html
- [x] Yes, I have read the FAQ part on online manual.
Task list for Issue attackers (only for developers)
- [ ] Understand the problem or question described by the user.
- [ ] Check if the issue is a known problem or has been addressed in the documentation.
- [ ] Test the issue or problem on a similar system or environment, if possible.
- [ ] Identify the root cause or provide clarification on the user's question.
- [ ] Provide a step-by-step guide, including any necessary resources, to resolve the issue or answer the question.
- [ ] If the issue is related to documentation, update the documentation to prevent future confusion (optional).
- [ ] If the issue is related to code, consider implementing a fix or improvement (optional).
- [ ] Review and incorporate any relevant feedback from users or developers.
- [ ] Ensure the user's issue is resolved or their question is answered and close the ticket.
Do you want results for PW basis or LCAO basis?
LCAO perhaps?Thanks a lot
On Tue, Apr 1, 2025 at 20:27 Mohan Chen @.***> wrote:
Do you want results for PW basis or LCAO basis?
— Reply to this email directly, view it on GitHub https://github.com/deepmodeling/abacus-develop/issues/6091#issuecomment-2769194564, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAY2QTP26QY7MOGWUYFKC632XKA4LAVCNFSM6AAAAAB2GUJENKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRZGE4TINJWGQ . You are receiving this because you authored the thread.Message ID: @.***> [image: mohanchen]mohanchen left a comment (deepmodeling/abacus-develop#6091) https://github.com/deepmodeling/abacus-develop/issues/6091#issuecomment-2769194564
Do you want results for PW basis or LCAO basis?
— Reply to this email directly, view it on GitHub https://github.com/deepmodeling/abacus-develop/issues/6091#issuecomment-2769194564, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAY2QTP26QY7MOGWUYFKC632XKA4LAVCNFSM6AAAAAB2GUJENKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRZGE4TINJWGQ . You are receiving this because you authored the thread.Message ID: @.***>
An example for reference: I performed SCF calculations for Mo (Molybdenum) with LCAO basis on both CPU and GPU. The CPU calculation took 610 seconds, while the GPU calculation took 131 seconds, showing a significant acceleration effect.
Setup:
- CPU: 8 cores Intel(R) Xeon(R) Platinum 8462Y+ ; I used single process with 8 threads.
- GPU: one A800 GPU along with 8 CPU cores ; also single process with 8 threads.
- System: 32 Mo atoms in body-centered cubic structure
- CPU version: ABACUS v3.7.3; GPU version : ABACUS v3.9.0.2
The input and output files are attached.
@OutisLi Do you think the answer is sufficient?
@OutisLi Do you think the answer is sufficient?
Thanks a lot. I tested the demo and my result is shown below: PC 1: AMD 7950X3D with 24G RAM + NVIDIA 5090D:
- the command is OMP_NUM_THREADS=1 mpirun -n 16
- CPU: 60s per loop
- GPU: 30s per loop
PC 2: Intel Xeon Gold 6430 * 2 (ignore the gpu)
- for mpirun -n 64, the time is 60s per loop
- for -n 32, is 30s
- for -n 16, is 35s
- with very cheap gpu(4G VRAM): cannot compute ( the speed is too slow, so I shut it down)
I'd like to know whether it is using fp32 or fp64, since for gaming cards, fp64 performance is very poor
Besides, I install abacus through conda, my installation commands for intel+nvidia and amd+nvidia are as shown below respectively:
INTEL + NVIDIA
conda update -n base -c defaults conda -y
conda create -n abacus_conda "abacus=3.9.*=cuda*mpich*" "libblas=*=*mkl" "cuda-version=12*=*" mpich python=3.12 ipykernel -c conda-forge -y
conda activate abacus_conda && conda install cuda-cudart cuda-version=12 -y
AMD + NVIDIA
conda update -n base -c defaults conda -y
conda create -n abacus_conda "abacus=3.9.*=cuda*mpich*" "libblas=*=*openblas" "cuda-version=12*=*" mpich python=3.12 ipykernel -c conda-forge -y
conda activate abacus_conda && conda install cuda-cudart cuda-version=12 -y
My cuda version is 12.8
Besides, I install abacus through conda, my installation commands for intel+nvidia and amd+nvidia are as shown below respectively:
INTEL + NVIDIA
conda update -n base -c defaults conda -y
conda create -n abacus_conda "abacus=3.9.=cudampich*" "libblas==mkl" "cuda-version=12=" mpich python=3.12 ipykernel -c conda-forge -y
conda activate abacus_conda && conda install cuda-cudart cuda-version=12 -y
AMD + NVIDIA
conda update -n base -c defaults conda -y
conda create -n abacus_conda "abacus=3.9.=cudampich*" "libblas==openblas" "cuda-version=12=" mpich python=3.12 ipykernel -c conda-forge -y
conda activate abacus_conda && conda install cuda-cudart cuda-version=12 -y My cuda version is 12.8
You can try toolchain installation if you are convient.
@OutisLi Do you think the answer is sufficient?
Thanks a lot. I tested the demo and my result is shown below: PC 1: AMD 7950X3D with 24G RAM + NVIDIA 5090D:
- the command is OMP_NUM_THREADS=1 mpirun -n 16
- CPU: 60s per loop
- GPU: 30s per loop
PC 2: Intel Xeon Gold 6430 * 2 (ignore the gpu)
- for mpirun -n 64, the time is 60s per loop
- for -n 32, is 30s
- for -n 16, is 35s
- with very cheap gpu(4G VRAM): cannot compute ( the speed is too slow, so I shut it down)
I'd like to know whether it is using fp32 or fp64, since for gaming cards, fp64 performance is very poor
most calculations are based on FP64
It is a valuable discussion, I will close it now.