abacus-develop icon indicating copy to clipboard operation
abacus-develop copied to clipboard

Discussion: numerical error of functions recip2real&real2recip and MPI scatter&gather

Open kirk0830 opened this issue 5 months ago • 1 comments

Describe the Code Quality Issue

Recently @QuantumMisaka finds there is a case that ABACUS cannot restart SCF well (I mean ABACUS cannot get SCF convergence within only 1 SCF step) if the charge read-in is rho(G) instead of rho(r). I dived into our code and find rho(G) will be transformed to rho(r) by calling recip2real before building operators: https://github.com/deepmodeling/abacus-develop/blob/52f781655cf80ee3407f148dbc27fd27bfb7a949/source/module_elecstate/module_charge/charge_init.cpp#L36-L43 , while if rho(r) is directly read, then it will be used in operator construction without other function calls.

I do some numerical experiments on the case he provided:

Title: numerical experiments on restarting from rho(r) and rho(G)
Background: For some cases restart from rho(G) and rho(r) are not the same.
The rho(r) can result in only-one-step SCF convergence behavior, while the
rho(G) needs more steps. For this is a highly-non-symmetric system, the 
other factor that symmetry is not explicitly entered in the calculation.

Technical review: the rho(G) will be read and FFT to rho(r) to build
operators. After the SCF convergence, rho(r) will be FFT to rho(G) and
save as binary file.

Variable nomenclature:
rhog_serial: a mpi run of ABACUS dumped rho(G), then read-in by ABACUS and transform to rho(r)
rhog_serial: a serial run of ABACUS dumped rho(G), then read-in by ABACUS and transform to rho(r)
rho_mpi: a mpi run of ABACUS dumped rho(r), then read-in by ABACUS
rho_serial: a serial run of ABACUS dumped rho(r), then read-in by ABACUS
      
Comparison:
rhoX_serial vs rhoX_mpi: correctness of mpi related operations
rhog_X vs rho_X: correctness of rho(G) and rho(r) transformation

MPI parallelization troubleshooting:

Compare items: rhog_serial vs rhog_mpi
Data size of rho1: 4976640
Data size of rho2: 4976640
Calculate the difference element-wise.
Max: 1.0000000000065512e-05
Average: 3.106516781219627e-09
Stddev: 6.054659957007453e-08


Compare items: rho_serial vs rho_mpi
Data size of rho1: 4976640
Data size of rho2: 4976640
Calculate the difference element-wise.
Max: 1.0000000000065512e-05
Average: 2.774761974177509e-09
Stddev: 4.583812103644124e-08

Correctness of rho(G) and rho(r) transformation:

Compare items: rhog_serial vs rho_serial
Data size of rho1: 4976640
Data size of rho2: 4976640
Calculate the difference element-wise.
Max: 3.2999999999949736e-05
Average: 1.1926822154936815e-06
Stddev: 1.9430601846807054e-06


Compare items: rhog_mpi vs rho_mpi
Data size of rho1: 4976640
Data size of rho2: 4976640
Calculate the difference element-wise.
Max: 3.2999999999949736e-05
Average: 1.1925923426063388e-06
Stddev: 1.942894758159233e-06

The rest two comparison:

Compare items: rho_serial vs rhog_mpi
Data size of rho1: 4976640
Data size of rho2: 4976640
Calculate the difference element-wise.
Max: 3.2999999999949736e-05
Average: 1.192663752895955e-06
Stddev: 1.9429703560257743e-06


Compare items: rho_mpi vs rhog_serial
Data size of rho1: 4976640
Data size of rho2: 4976640
Calculate the difference element-wise.
Max: 3.2999999999949736e-05
Average: 1.1926481284618544e-06
Stddev: 1.9429709915575435e-06

. This result makes me upset.

Additional Context

No response

Task list for Issue attackers (only for developers)

  • [ ] Identify the specific code file or section with the code quality issue.
  • [ ] Investigate the issue and determine the root cause.
  • [ ] Research best practices and potential solutions for the identified issue.
  • [ ] Refactor the code to improve code quality, following the suggested solution.
  • [ ] Ensure the refactored code adheres to the project's coding standards.
  • [ ] Test the refactored code to ensure it functions as expected.
  • [ ] Update any relevant documentation, if necessary.
  • [ ] Submit a pull request with the refactored code and a description of the changes made.

kirk0830 avatar Sep 01 '24 03:09 kirk0830