High Wall Time Consumption in DiagoElpa elpa_solve during HSE Calculation
Details
I recently performed a DFT simulation using the HSE hybrid functional (LCAO basis) which resulted in a significant wall time (7.95 hours).
The performance log indicates that the diagonalisation routine, DiagoElpa elpa_solve, is consuming a very large proportion of the total runtime (96.18% over 600 calls). I would appreciate if I can get some clarification on whether this high consumption is expected for the scale of this HSE calculation, or if it might suggest an area for potential performance tuning within this specific ABACUS workflow.
This is the overall time statistics:
--------------------------------------------------------------------------------------------
CLASS_NAME NAME TIME/s CALLS AVG/s PER/%
--------------------------------------------------------------------------------------------
total 28629.62 13 2202.28 100.00
Driver reading 0.04 1 0.04 0.00
Input_Conv Convert 0.00 1 0.00 0.00
Driver driver_line 28629.58 1 28629.58 100.00
UnitCell check_tau 0.00 1 0.00 0.00
ESolver_KS_LCAO before_all_runners 15.64 1 15.64 0.05
PW_Basis_Sup setuptransform 0.23 1 0.23 0.00
PW_Basis_Sup distributeg 0.02 1 0.02 0.00
mymath heapsort 0.25 98673 0.00 0.00
Charge_Mixing init_mixing 0.00 1 0.00 0.00
Symmetry analy_sys 0.04 1 0.04 0.00
PW_Basis_K setuptransform 0.07 1 0.07 0.00
PW_Basis_K distributeg 0.01 1 0.01 0.00
PW_Basis setup_struc_factor 0.18 1 0.18 0.00
NOrbital_Lm extra_uniform 1.33 2492 0.00 0.00
Mathzone_Add1 SplineD2 0.06 2492 0.00 0.00
Mathzone_Add1 Cubic_Spline_Interpolation 0.31 2492 0.00 0.00
Exx_LRI init 11.49 1 11.49 0.04
Matrix_Orbs21 init 3.69 2 1.85 0.01
ORB_gaunt_table init_Gaunt_CH 0.14 3 0.05 0.00
ORB_gaunt_table Calc_Gaunt_CH 0.07 82305 0.00 0.00
ORB_gaunt_table init_Gaunt 1.92 3 0.64 0.01
ORB_gaunt_table Get_Gaunt_SH 2.32 3635313 0.00 0.01
Matrix_Orbs21 init_radial 0.00 2 0.00 0.00
Matrix_Orbs21 init_radial_table 5.22 2 2.61 0.02
Center2_Orb cal_ST_Phi12_R 3.95 4447 0.00 0.01
LRI_CV set_orbitals 6.02 1 6.02 0.02
Matrix_Orbs11 init 0.22 1 0.22 0.00
Matrix_Orbs11 init_radial 0.00 1 0.00 0.00
Matrix_Orbs11 init_radial_table 1.31 1 1.31 0.00
Symmetry_rotation cal_Ms 0.43 1 0.43 0.00
ppcell_vl init_vloc 0.29 1 0.29 0.00
Ions opt_ions 28613.07 1 28613.07 99.94
ESolver_KS_LCAO runner 28613.07 1 28613.07 99.94
ESolver_KS_LCAO before_scf 42.00 1 42.00 0.15
Vdwd3 cal_energy 0.36 1 0.36 0.00
atom_arrange search 0.00 1 0.00 0.00
atom_arrange grid_d.init 0.00 1 0.00 0.00
Grid Construct_Adjacent_expand 0.00 1 0.00 0.00
Grid Construct_Adjacent_expand_periodic 0.00 108 0.00 0.00
Grid_Technique init 0.23 1 0.23 0.00
Grid_BigCell grid_expansion_index 0.01 2 0.00 0.00
Grid_Driver Find_atom 0.01 650 0.00 0.00
Record_adj for_2d 0.06 1 0.06 0.00
LCAO_domain grid_prepare 0.00 1 0.00 0.00
Veff initialize_HR 0.03 1 0.03 0.00
OverlapNew initialize_SR 0.03 1 0.03 0.00
EkineticNew initialize_HR 0.03 1 0.03 0.00
NonlocalNew initialize_HR 0.06 1 0.06 0.00
Exx_LRI cal_exx_ions 39.37 1 39.37 0.14
LRI_CV cal_datas 5.80 2 2.90 0.02
Charge set_rho_core 0.21 1 0.21 0.00
PW_Basis_Sup recip2real 13.82 422 0.03 0.05
PW_Basis_Sup gathers_scatterp 1.49 422 0.00 0.01
Charge atomic_rho 0.80 2 0.40 0.00
Potential init_pot 0.42 1 0.42 0.00
Potential update_from_charge 178.44 61 2.93 0.62
Potential cal_fixed_v 0.04 1 0.04 0.00
PotLocal cal_fixed_v 0.04 1 0.04 0.00
Potential cal_v_eff 178.27 61 2.92 0.62
H_Hartree_pw v_hartree 4.89 61 0.08 0.02
PW_Basis_Sup real2recip 14.32 443 0.03 0.05
PW_Basis_Sup gatherp_scatters 1.22 443 0.00 0.00
PotXC cal_v_eff 172.97 61 2.84 0.60
XC_Functional v_xc 10344.25 39 265.24 36.13
Potential interpolate_vrs 0.13 61 0.00 0.00
Symmetry rhog_symmetry 2.82 61 0.05 0.01
Symmetry group fft grids 2.41 61 0.04 0.01
H_Ewald_pw compute_ewald 0.04 1 0.04 0.00
HSolverLCAO solve 27658.95 60 460.98 96.61
HamiltLCAO updateHk 80.27 600 0.13 0.28
OperatorLCAO init 57.33 2400 0.02 0.20
Veff contributeHR 50.13 60 0.84 0.18
Gint_interface cal_gint 80.07 120 0.67 0.28
Gint_interface cal_gint_vlocal 41.71 60 0.70 0.15
Gint_Tools cal_psir_ylm 13.25 46975 0.00 0.05
Gint_k transfer_pvpR 1.12 60 0.02 0.00
OverlapNew calculate_SR 0.15 1 0.15 0.00
OverlapNew contributeHk 3.48 600 0.01 0.01
EkineticNew contributeHR 0.35 60 0.01 0.00
EkineticNew calculate_HR 0.18 1 0.18 0.00
NonlocalNew contributeHR 3.25 60 0.05 0.01
NonlocalNew calculate_HR 0.84 1 0.84 0.00
OperatorLCAO contributeHk 1.84 600 0.00 0.01
HSolverLCAO hamiltSolvePsiK 27536.45 600 45.89 96.18
DiagoElpa elpa_solve 27534.60 600 45.89 96.18
elecstate cal_dm 6.68 60 0.11 0.02
psiMulPsiMpi pdgemm 5.34 600 0.01 0.02
DensityMatrix cal_DMR 2.07 60 0.03 0.01
ElecStateLCAO psiToRho 33.31 60 0.56 0.12
Gint transfer_DMR 1.52 60 0.03 0.01
Gint_interface cal_gint_rho 31.06 60 0.52 0.11
Charge_Mixing get_drho 0.04 60 0.00 0.00
Charge mix_rho 4.89 52 0.09 0.02
Charge Broyden_mixing 0.29 52 0.01 0.00
ModuleIO write_rhog 5.95 8 0.74 0.02
Symmetry_rotation restore_dm 1.56 7 0.22 0.01
RI_2D_Comm split_m2D_ktoR 76.48 7 10.93 0.27
Exx_LRI cal_exx_elec 632.30 7 90.33 2.21
Symmetry_rotation restore_HR 5.86 7 0.84 0.02
RI_2D_Comm add_HexxR 17.47 44 0.40 0.06
XC_Functional_Libxc v_xc_libxc 166.42 44 3.78 0.58
ESolver_KS_LCAO after_scf 0.39 1 0.39 0.00
ESolver_KS_LCAO out_deepks_labels 0.00 1 0.00 0.00
LCAO_Deepks_Interface out_deepks_labels 0.00 1 0.00 0.00
ESolver_KS_LCAO after_all_runners 0.02 1 0.02 0.00
ModuleIO write_istate_info 0.02 1 0.02 0.00
--------------------------------------------------------------------------------------------
This is my input parameters, and my structure has 108 atoms:
INPUT_PARAMETERS
calculation scf
symmetry 1
ecutwfc 100
scf_thr 1e-07
scf_nmax 100
smearing_method gauss
smearing_sigma 0.015
mixing_type broyden
mixing_beta 0.8
basis_type lcao
ks_solver genelpa
kspacing 0.14
dft_functional hse
vdw_method d3_0
Thanks for your help :)
Have you read FAQ on the online manual http://abacus.deepmodeling.com/en/latest/community/faq.html
- [x] Yes, I have read the FAQ part on online manual.
Task list for Issue attackers (only for developers)
- [ ] Understand the problem or question described by the user.
- [ ] Check if the issue is a known problem or has been addressed in the documentation.
- [ ] Test the issue or problem on a similar system or environment, if possible.
- [ ] Identify the root cause or provide clarification on the user's question.
- [ ] Provide a step-by-step guide, including any necessary resources, to resolve the issue or answer the question.
- [ ] If the issue is related to documentation, update the documentation to prevent future confusion (optional).
- [ ] If the issue is related to code, consider implementing a fix or improvement (optional).
- [ ] Review and incorporate any relevant feedback from users or developers.
- [ ] Ensure the user's issue is resolved or their question is answered and close the ticket.
This is most likely due to non convergence. Can you present the screen output?
I suggest to add exx_seperate_loop=0 to change to the one-loop iteration with better convergence rate.
This is most likely due to non convergence. Can you present the screen output? I suggest to add
exx_seperate_loop=0to change to the one-loop iteration with better convergence rate.
Thanks for your reply! I am not too sure which screen output were you referring to, so I just paste the whole log here (hope you don't mind!)
COMMAND: OMP_NUM_THREADS=64 mpirun -n 1 abacus | tee out.log
ABACUS v3.10.0
Atomic-orbital Based Ab-initio Computation at UStc
Website: http://abacus.ustc.edu.cn/
Documentation: https://abacus.deepmodeling.com/
Repository: https://github.com/abacusmodeling/abacus-develop
https://github.com/deepmodeling/abacus-develop
Commit: 8eed91df6 (Fri Mar 28 23:14:54 2025 +0800)
Thu Oct 30 03:17:57 2025
MAKE THE DIR : OUT.ABACUS/
RUNNING WITH DEVICE : CPU / Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
dft_functional readin is: hse
dft_functional in pseudopot file is: PBE
Please make sure this is what you need
dft_functional readin is: hse
dft_functional in pseudopot file is: PBE
Please make sure this is what you need
UNIFORM GRID DIM : 108 * 180 * 100
UNIFORM GRID DIM(BIG) : 27 * 36 * 25
DONE(0.72698 SEC) : SETUP UNITCELL
DONE(0.764897 SEC) : SYMMETRY
DONE(0.879552 SEC) : INIT K-POINTS
---------------------------------------------------------
Self-consistent calculations for electrons
---------------------------------------------------------
SPIN KPOINTS PROCESSORS THREADS NBASE
1 10 1 64 1020
---------------------------------------------------------
Use Systematically Improvable Atomic bases
---------------------------------------------------------
ELEMENT ORBITALS NBASE NATOM XC
C 2s2p1d-8au 13 60
H 2s1p-6au 5 48
---------------------------------------------------------
Initial plane wave basis and FFT box
---------------------------------------------------------
DONE(1.01881 SEC) : INIT PLANEWAVE
DONE(15.5474 SEC) : LOCAL POTENTIAL
-------------------------------------------
SELF-CONSISTENT :
-------------------------------------------
START CHARGE : atomic
DONE(57.7015 SEC) : INIT SCF
ITER ETOT/eV EDIFF/eV DRHO TIME/s
GE1 -1.05498895e+04 0.00000000e+00 2.3892e-01 611.80
GE2 -1.05798242e+04 -2.99347022e+01 6.8716e-02 583.56
GE3 -1.05835346e+04 -3.71040168e+00 2.8237e-02 530.37
GE4 -1.05846311e+04 -1.09644481e+00 1.2865e-02 488.38
GE5 -1.05848705e+04 -2.39387787e-01 4.4637e-03 458.05
GE6 -1.05848895e+04 -1.90654241e-02 1.6153e-03 480.44
GE7 -1.05848906e+04 -1.10186697e-03 6.6045e-04 488.62
GE8 -1.05848909e+04 -2.48836470e-04 1.8482e-04 479.58
GE9 -1.05848909e+04 -2.29363579e-05 8.2495e-05 470.63
GE10 -1.05848909e+04 -4.31703954e-06 2.4366e-05 458.35
GE11 -1.05848909e+04 -4.24875036e-07 1.0319e-05 453.23
GE12 -1.05848909e+04 -9.09326188e-08 3.6620e-06 488.46
GE13 -1.05848909e+04 -4.50270214e-09 1.3937e-06 450.93
GE14 -1.05848909e+04 -1.74168417e-09 4.4686e-07 457.25
GE15 -1.05848909e+04 -7.11522839e-11 1.7432e-07 459.11
GE16 -1.05848909e+04 1.12451544e-09 9.6405e-08 440.29
Updating EXX and rerun SCF 1.023e+02 (s)
ITER ETOT/eV EDIFF/eV DRHO TIME/s
GE1 -1.04263774e+04 0.00000000e+00 4.0589e-02 460.50
GE2 -1.04412704e+04 -1.48929627e+01 5.4751e-02 446.30
GE3 -1.04408681e+04 4.02222481e-01 2.8687e-02 453.00
GE4 -1.04405269e+04 3.41205322e-01 1.6343e-02 439.33
GE5 -1.04405280e+04 -1.03523091e-03 5.9308e-03 458.26
GE6 -1.04405471e+04 -1.91112578e-02 6.3619e-04 448.42
GE7 -1.04405475e+04 -4.10480251e-04 1.9468e-04 448.10
GE8 -1.04405475e+04 -4.05059898e-06 6.0965e-05 494.08
GE9 -1.04405475e+04 -1.30756861e-06 1.2919e-05 458.87
GE10 -1.04405475e+04 4.19988730e-07 8.6642e-06 448.91
GE11 -1.04405475e+04 1.19102736e-09 1.6147e-06 466.04
GE12 -1.04405475e+04 1.48955760e-09 5.6883e-07 457.43
GE13 -1.04405475e+04 -4.65583423e-10 2.2205e-07 450.15
GE14 -1.04405475e+04 2.83062347e-10 6.3210e-08 459.74
EDIFF/eV (outer loop): 1.44343433e+02
Updating EXX and rerun SCF 1.010e+02 (s)
ITER ETOT/eV EDIFF/eV DRHO TIME/s
GE1 -1.04406088e+04 0.00000000e+00 7.3374e-04 486.45
GE2 -1.04406091e+04 -3.15330169e-04 1.7970e-04 470.85
GE3 -1.04406091e+04 -2.27854501e-05 5.4457e-05 470.10
GE4 -1.04406091e+04 -5.05689336e-06 2.4547e-05 451.09
GE5 -1.04406091e+04 -3.20149701e-07 6.5763e-06 457.95
GE6 -1.04406091e+04 -4.37554063e-07 2.6837e-06 453.23
GE7 -1.04406091e+04 -3.58854997e-09 7.9347e-07 467.16
GE8 -1.04406091e+04 -1.00541271e-10 2.7768e-07 429.47
GE9 -1.04406091e+04 -3.06667890e-07 8.2611e-08 433.69
EDIFF/eV (outer loop): 6.16389165e-02
Updating EXX and rerun SCF 1.016e+02 (s)
ITER ETOT/eV EDIFF/eV DRHO TIME/s
GE1 -1.04406095e+04 0.00000000e+00 6.5060e-05 458.60
GE2 -1.04406095e+04 3.34412641e-07 1.7568e-05 463.26
GE3 -1.04406095e+04 -3.94777620e-07 7.2803e-06 451.22
GE4 -1.04406095e+04 -5.93858617e-08 2.6441e-06 428.58
GE5 -1.04406095e+04 -7.70455492e-09 7.3109e-07 456.04
GE6 -1.04406095e+04 -9.07965014e-10 2.3624e-07 447.14
GE7 -1.04406095e+04 3.61948575e-10 1.0425e-07 450.33
GE8 -1.04406095e+04 -1.97988964e-10 3.5817e-08 428.65
EDIFF/eV (outer loop): 3.91009430e-04
Updating EXX and rerun SCF 1.006e+02 (s)
ITER ETOT/eV EDIFF/eV DRHO TIME/s
GE1 -1.04406095e+04 0.00000000e+00 8.2270e-06 447.56
GE2 -1.04406095e+04 -1.90502506e-08 2.6999e-06 419.57
GE3 -1.04406095e+04 4.01608239e-07 1.5112e-06 439.34
GE4 -1.04406095e+04 -1.95514102e-09 3.6249e-07 447.07
GE5 -1.04406095e+04 -3.12451334e-10 8.8880e-08 456.99
EDIFF/eV (outer loop): 4.98955391e-06
Updating EXX and rerun SCF 1.008e+02 (s)
ITER ETOT/eV EDIFF/eV DRHO TIME/s
GE1 -1.04406095e+04 0.00000000e+00 1.3745e-06 449.30
GE2 -1.04406095e+04 -8.69295295e-10 5.5252e-07 455.60
GE3 -1.04406095e+04 -1.80974287e-10 2.7951e-07 461.61
GE4 -1.04406095e+04 -3.43387109e-10 5.5494e-08 471.85
EDIFF/eV (outer loop): 1.32024610e-07
Updating EXX and rerun SCF 1.020e+02 (s)
ITER ETOT/eV EDIFF/eV DRHO TIME/s
GE1 -1.04406095e+04 0.00000000e+00 3.0121e-07 460.34
GE2 -1.04406095e+04 -4.42381591e-10 2.1515e-07 451.93
GE3 -1.04406095e+04 -1.59319244e-10 4.9740e-08 505.61
EDIFF/eV (outer loop): 1.10084958e-08
Updating EXX and rerun SCF 1.029e+02 (s)
ITER ETOT/eV EDIFF/eV DRHO TIME/s
GE1 -1.04406095e+04 0.00000000e+00 5.8963e-08 493.57
EDIFF/eV (outer loop): 2.91260327e-09
TIME STATISTICS
--------------------------------------------------------------------------------------------
CLASS_NAME NAME TIME/s CALLS AVG/s PER/%
--------------------------------------------------------------------------------------------
total 28629.62 13 2202.28 100.00
Driver reading 0.04 1 0.04 0.00
Input_Conv Convert 0.00 1 0.00 0.00
Driver driver_line 28629.58 1 28629.58 100.00
UnitCell check_tau 0.00 1 0.00 0.00
ESolver_KS_LCAO before_all_runners 15.64 1 15.64 0.05
PW_Basis_Sup setuptransform 0.23 1 0.23 0.00
PW_Basis_Sup distributeg 0.02 1 0.02 0.00
mymath heapsort 0.25 98673 0.00 0.00
Charge_Mixing init_mixing 0.00 1 0.00 0.00
Symmetry analy_sys 0.04 1 0.04 0.00
PW_Basis_K setuptransform 0.07 1 0.07 0.00
PW_Basis_K distributeg 0.01 1 0.01 0.00
PW_Basis setup_struc_factor 0.18 1 0.18 0.00
NOrbital_Lm extra_uniform 1.33 2492 0.00 0.00
Mathzone_Add1 SplineD2 0.06 2492 0.00 0.00
Mathzone_Add1 Cubic_Spline_Interpolation 0.31 2492 0.00 0.00
Exx_LRI init 11.49 1 11.49 0.04
Matrix_Orbs21 init 3.69 2 1.85 0.01
ORB_gaunt_table init_Gaunt_CH 0.14 3 0.05 0.00
ORB_gaunt_table Calc_Gaunt_CH 0.07 82305 0.00 0.00
ORB_gaunt_table init_Gaunt 1.92 3 0.64 0.01
ORB_gaunt_table Get_Gaunt_SH 2.32 3635313 0.00 0.01
Matrix_Orbs21 init_radial 0.00 2 0.00 0.00
Matrix_Orbs21 init_radial_table 5.22 2 2.61 0.02
Center2_Orb cal_ST_Phi12_R 3.95 4447 0.00 0.01
LRI_CV set_orbitals 6.02 1 6.02 0.02
Matrix_Orbs11 init 0.22 1 0.22 0.00
Matrix_Orbs11 init_radial 0.00 1 0.00 0.00
Matrix_Orbs11 init_radial_table 1.31 1 1.31 0.00
Symmetry_rotation cal_Ms 0.43 1 0.43 0.00
ppcell_vl init_vloc 0.29 1 0.29 0.00
Ions opt_ions 28613.07 1 28613.07 99.94
ESolver_KS_LCAO runner 28613.07 1 28613.07 99.94
ESolver_KS_LCAO before_scf 42.00 1 42.00 0.15
Vdwd3 cal_energy 0.36 1 0.36 0.00
atom_arrange search 0.00 1 0.00 0.00
atom_arrange grid_d.init 0.00 1 0.00 0.00
Grid Construct_Adjacent_expand 0.00 1 0.00 0.00
Grid Construct_Adjacent_expand_periodic 0.00 108 0.00 0.00
Grid_Technique init 0.23 1 0.23 0.00
Grid_BigCell grid_expansion_index 0.01 2 0.00 0.00
Grid_Driver Find_atom 0.01 650 0.00 0.00
Record_adj for_2d 0.06 1 0.06 0.00
LCAO_domain grid_prepare 0.00 1 0.00 0.00
Veff initialize_HR 0.03 1 0.03 0.00
OverlapNew initialize_SR 0.03 1 0.03 0.00
EkineticNew initialize_HR 0.03 1 0.03 0.00
NonlocalNew initialize_HR 0.06 1 0.06 0.00
Exx_LRI cal_exx_ions 39.37 1 39.37 0.14
LRI_CV cal_datas 5.80 2 2.90 0.02
Charge set_rho_core 0.21 1 0.21 0.00
PW_Basis_Sup recip2real 13.82 422 0.03 0.05
PW_Basis_Sup gathers_scatterp 1.49 422 0.00 0.01
Charge atomic_rho 0.80 2 0.40 0.00
Potential init_pot 0.42 1 0.42 0.00
Potential update_from_charge 178.44 61 2.93 0.62
Potential cal_fixed_v 0.04 1 0.04 0.00
PotLocal cal_fixed_v 0.04 1 0.04 0.00
Potential cal_v_eff 178.27 61 2.92 0.62
H_Hartree_pw v_hartree 4.89 61 0.08 0.02
PW_Basis_Sup real2recip 14.32 443 0.03 0.05
PW_Basis_Sup gatherp_scatters 1.22 443 0.00 0.00
PotXC cal_v_eff 172.97 61 2.84 0.60
XC_Functional v_xc 10344.25 39 265.24 36.13
Potential interpolate_vrs 0.13 61 0.00 0.00
Symmetry rhog_symmetry 2.82 61 0.05 0.01
Symmetry group fft grids 2.41 61 0.04 0.01
H_Ewald_pw compute_ewald 0.04 1 0.04 0.00
HSolverLCAO solve 27658.95 60 460.98 96.61
HamiltLCAO updateHk 80.27 600 0.13 0.28
OperatorLCAO init 57.33 2400 0.02 0.20
Veff contributeHR 50.13 60 0.84 0.18
Gint_interface cal_gint 80.07 120 0.67 0.28
Gint_interface cal_gint_vlocal 41.71 60 0.70 0.15
Gint_Tools cal_psir_ylm 13.25 46975 0.00 0.05
Gint_k transfer_pvpR 1.12 60 0.02 0.00
OverlapNew calculate_SR 0.15 1 0.15 0.00
OverlapNew contributeHk 3.48 600 0.01 0.01
EkineticNew contributeHR 0.35 60 0.01 0.00
EkineticNew calculate_HR 0.18 1 0.18 0.00
NonlocalNew contributeHR 3.25 60 0.05 0.01
NonlocalNew calculate_HR 0.84 1 0.84 0.00
OperatorLCAO contributeHk 1.84 600 0.00 0.01
HSolverLCAO hamiltSolvePsiK 27536.45 600 45.89 96.18
DiagoElpa elpa_solve 27534.60 600 45.89 96.18
elecstate cal_dm 6.68 60 0.11 0.02
psiMulPsiMpi pdgemm 5.34 600 0.01 0.02
DensityMatrix cal_DMR 2.07 60 0.03 0.01
ElecStateLCAO psiToRho 33.31 60 0.56 0.12
Gint transfer_DMR 1.52 60 0.03 0.01
Gint_interface cal_gint_rho 31.06 60 0.52 0.11
Charge_Mixing get_drho 0.04 60 0.00 0.00
Charge mix_rho 4.89 52 0.09 0.02
Charge Broyden_mixing 0.29 52 0.01 0.00
ModuleIO write_rhog 5.95 8 0.74 0.02
Symmetry_rotation restore_dm 1.56 7 0.22 0.01
RI_2D_Comm split_m2D_ktoR 76.48 7 10.93 0.27
Exx_LRI cal_exx_elec 632.30 7 90.33 2.21
Symmetry_rotation restore_HR 5.86 7 0.84 0.02
RI_2D_Comm add_HexxR 17.47 44 0.40 0.06
XC_Functional_Libxc v_xc_libxc 166.42 44 3.78 0.58
ESolver_KS_LCAO after_scf 0.39 1 0.39 0.00
ESolver_KS_LCAO out_deepks_labels 0.00 1 0.00 0.00
LCAO_Deepks_Interface out_deepks_labels 0.00 1 0.00 0.00
ESolver_KS_LCAO after_all_runners 0.02 1 0.02 0.00
ModuleIO write_istate_info 0.02 1 0.02 0.00
--------------------------------------------------------------------------------------------
START Time : Thu Oct 30 03:17:57 2025
CPUINFO:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
Stepping: 4
BogoMIPS: 5000.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 1 MiB (32 instances)
L1i cache: 1 MiB (32 instances)
L2 cache: 32 MiB (32 instances)
L3 cache: 33 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-63
Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
This is most likely due to non convergence. Can you present the screen output? I suggest to add
exx_seperate_loop=0to change to the one-loop iteration with better convergence rate.Thanks for your reply! I am not too sure which screen output were you referring to, so I just paste the whole log here (hope you don't mind!)
The number of outer-loop step is 7, thus the convergence speed is fine.
You can still add exx_seperate_loop=0, which will reduce the number of GGA steps. Since each GGA step consumes ~450s, and each EXX step consumes ~100s, this will reduce the total time.
This is most likely due to non convergence. Can you present the screen output? I suggest to add
exx_seperate_loop=0to change to the one-loop iteration with better convergence rate.Thanks for your reply! I am not too sure which screen output were you referring to, so I just paste the whole log here (hope you don't mind!)
The number of outer-loop step is 7, thus the convergence speed is fine. You can still add
exx_seperate_loop=0, which will reduce the number of GGA steps. Since each GGA step consumes ~450s, and each EXX step consumes ~100s, this will reduce the total time.
Hello,
Sorry to raise another problem regarding this simulation, but when I was checking the time consumption, I noticed that the percentage consumption was 36.13% for XC_Functional and 96.18% for DiagoElpa. They seem to add up to more than 100%; I was just wondering if this is normal?
@xintianwangwendy Hello! You can try executure OMP_NUM_THREADS=64 abacus without mpirun if you are only using one process with 64 threads for acceleration since your PBE step seems abnormally time-consuming