Precision Discrepancy of stress between single- and multi-core
Describe the bug
When running cell-relax of a FCC-Al(See #6141 ), I discovered that stress given by CPU and GPU will diverge slightly between single- and multi-core.
All parts of stress except the EWALD term have some deviation between computing configurations.
In this case, the first-step total-stress results(KBAR) are as follows: (All with OMP_NUM_THREADS=1)
- CPU, mpirun -np 1: -22.440750
- GPU, mpirun -np 1: -22.441476
- CPU, mpirun -np 4: -22.244413
- GPU, mpirun -np 4: -22.264452
Expected behavior
Should the results be nearly the same between single- and multi-core?
To Reproduce
A simple case that can be downloaded from https://github.com/mcresearch/abacus-user-guide/tree/master/examples/surface_energy/Al_fcc100/0_bulk.
Environment
- OS: Ubuntu 22.04.4 LTS
- Compiler:
- gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)
- nvcc Build cuda_12.4.r12.4/compiler.33961263_0
- ABACUS v3.9.0.2 Commit: 35448cbe7 (Mon Mar 31 09:24:22 2025 +0800)
- Built with
cmake -B build -DUSE_CUDA=ON
cmake --build build -j`nproc`
Additional Context
No response
Task list for Issue attackers (only for developers)
- [ ] Verify the issue is not a duplicate.
- [ ] Describe the bug.
- [ ] Steps to reproduce.
- [ ] Expected behavior.
- [ ] Error message.
- [ ] Environment details.
- [ ] Additional context.
- [ ] Assign a priority level (low, medium, high, urgent).
- [ ] Assign the issue to a team member.
- [ ] Label the issue with relevant tags.
- [ ] Identify possible related issues.
- [ ] Create a unit test or automated test to reproduce the bug (if applicable).
- [ ] Fix the bug.
- [ ] Test the fix.
- [ ] Update documentation (if necessary).
- [ ] Close the issue and inform the reporter (if applicable).
@pxlxingliang hello,can you retest this case?
@pxlxingliang hello,can you retest this case?
I have retest this case with bohrium image "registry.dp.tech/dptech/abacus-stable:LTSv3.10" on CPU, the results of 1 core and multi cores are not exactly same.
| Energy (eV) | stress 11(kbar) | d_energy of last SCF step | drho of last SCF step | |
|---|---|---|---|---|
| mpi 1 | -1883.2222505012 | -22.1915720470 | -4.85765150e-08 | 2.3881e-10 |
| mpi 2 | -1883.2222505016 | -22.3195005335 | -1.47594585e-08 | 1.8349e-11 |
| mpi 4 | -1883.2222505009 | -22.3739405740 | -6.19875604e-10 | 5.2155e-11 |
I try to set pw_seed to 0 to fix the random seed of initial guess density, and the results of different parallel cores are almost same, but the difference is slowly increasing as the calculation proceed. This error should be from the error of numerical addition in MPI.
| Energy (eV) | stress 11(kbar) | d_energy of last SCF step | drho of last SCF step | |
|---|---|---|---|---|
| mpi 1 | -1883.2222505025 | -22.2373067917 | -2.53522548e-08 | 1.6637e-10 |
| mpi 2 | -1883.2222505017 | -22.2373301205 | -2.42250325e-08 | 1.6622e-10 |
| mpi 4 | -1883.2222505012 | -22.2373176241 | -2.44562774e-08 | 1.6631e-10 |
Energies of SCF process with mpi 1
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4128704941 -1883.2037152559
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4136269962 -1883.2140079951
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142420330 -1883.2223759999
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142100680 -1883.2219410938
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142263472 -1883.2221625834
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142319274 -1883.2222385068
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142331332 -1883.2222549120
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142331330 -1883.2222549094
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142327125 -1883.2222491876
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328007 -1883.2222503881
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328157 -1883.2222505915
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328156 -1883.2222505901
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328072 -1883.2222504771
inputs-pwseed0/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328091 -1883.2222505025
Energies of SCF process with mpi 2
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4128704941 -1883.2037152559
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4136269962 -1883.2140079951
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142420330 -1883.2223759999
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142100680 -1883.2219410937
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142263472 -1883.2221625834
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142319274 -1883.2222385068
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142331332 -1883.2222549120
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142331330 -1883.2222549095
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142327124 -1883.2222491862
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328007 -1883.2222503886
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328156 -1883.2222505904
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328155 -1883.2222505890
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328073 -1883.2222504775
inputs-pwseed0-mpi2/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328091 -1883.2222505017
Energies of SCF process with mpi 4
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4128704941 -1883.2037152559
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4136269962 -1883.2140079951
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142420330 -1883.2223759999
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142100680 -1883.2219410937
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142263472 -1883.2221625834
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142319274 -1883.2222385068
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142331332 -1883.2222549120
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142331330 -1883.2222549097
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142327124 -1883.2222491867
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328008 -1883.2222503899
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328156 -1883.2222505905
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328155 -1883.2222505895
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328072 -1883.2222504767
inputs-pwseed0-mpi4/OUT.ABACUS/running_scf.log: E_KohnSham -138.4142328090 -1883.2222505012