abacus-develop icon indicating copy to clipboard operation
abacus-develop copied to clipboard

the time used for the local potential during the relaxation calculation is 2.5 times that of the SCF calculation for HSE function.

Open liusss324 opened this issue 7 months ago • 3 comments

Describe the bug

Dear, I performed the relaxation and self-consistent field (SCF) calculations using the HSE functional. I found that the time taken for the local potential during relaxation was 650 seconds, while it was 240 seconds for the SCF calculation. The former is 2.5 times longer than the latter. All input files used for both the relaxation and SCF calculations are the same.

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

  • [ ] Verify the issue is not a duplicate.
  • [ ] Describe the bug.
  • [ ] Steps to reproduce.
  • [ ] Expected behavior.
  • [ ] Error message.
  • [ ] Environment details.
  • [ ] Additional context.
  • [ ] Assign a priority level (low, medium, high, urgent).
  • [ ] Assign the issue to a team member.
  • [ ] Label the issue with relevant tags.
  • [ ] Identify possible related issues.
  • [ ] Create a unit test or automated test to reproduce the bug (if applicable).
  • [ ] Fix the bug.
  • [ ] Test the fix.
  • [ ] Update documentation (if necessary).
  • [ ] Close the issue and inform the reporter (if applicable).

liusss324 avatar May 13 '25 02:05 liusss324

Could you upload your files? Thanks!

mohanchen avatar May 14 '25 16:05 mohanchen

Could you upload your files? Thanks!

test.zip Dear, I upload my input files. The software was abacus 3.9.0 compiled with GNU. The command I used was OMP_NUM_THREADS=24 mpirun -np 4 abacus, which was optimized for my nodes. And the abacus 3.10 LTS compiled with GNU was observed same case.

liusss324 avatar May 15 '25 01:05 liusss324

@liusss324 Hello, this is a normal behavior of program. Since the thread parallel (OpenMP) performance of FFTW(which is mainly used in exchange-correlation potential) is lower than that of process parallelism (MPI), and HSE calculations require the use of thread parallelism, you can try to set fft_mode 1, which sometimes will help to accelerate.

dyzheng avatar Jun 16 '25 12:06 dyzheng