abacus-develop
abacus-develop copied to clipboard
Request: Better convergence of HSE in magnetic system
Details
I've tested HSE SCF in magnetic system, example is Fe-bcc conventional cell:
ATOMIC_SPECIES
Fe 55.845 Fe_ONCV_PBE-1.0.upf upf201
NUMERICAL_ORBITAL
Fe_gga_8au_100Ry_4s2p2d1f.orb
LATTICE_CONSTANT
1.889726
LATTICE_VECTORS
2.8301511117 0.0000000000 0.0000000000 #latvec1
0.0000000000 2.8301511117 0.0000000000 #latvec2
0.0000000000 -0.0000000000 2.8301511117 #latvec3
ATOMIC_POSITIONS
Direct
Fe #label
2 #magnetism
2 #number of atoms
0.0000000000 0.0000000000 0.0000000000 m 1 1 1
0.5000000000 0.5000000000 0.5000000000 m 1 1 1
And KPT is 9 9 9
Information: ABACUS version: 3.4.4: Commit: 5f9d472 (Mon Dec 4 14:10:21 2023 +0800) Dependence: Intel-OneAPI and Intel-toolchain LibRI and LibComm: latest version before Nov 18
At first, my INPUT example is
#Parameters (1.General)
suffix Fe # suffix of OUTPUT DIR
nspin 2 # 1/2/4 4 for SOC
symmetry 0 # 0/1 1 for open, default
esolver_type ksdft # ksdft, ofdft, sdft, tddft, lj, dp
dft_functional hse # same as upf file, can be lda/pbe/scan/hf/pbe0/hse
ks_solver genelpa # default for ksdft-lcao
vdw_method none # none, d3, d3_bj
#Parameters (2.Iteration)
calculation scf # scf relax cell-relax md
ecutwfc 100
scf_thr 1e-7
scf_nmax 300
#Parameters (3.Basis)
basis_type lcao # lcao or pw
#Parameters (4.Smearing)
smearing_method mp # mp/gaussian/fixed
smearing_sigma 0.002 # Rydberg
#Parameters (5.Mixing)
mixing_type broyden # pulay/broyden
#Parameters (6.Calculation)
cal_force 1
cal_stress 1
out_stru 1 # print STRU in OUT
out_chg 1 # print CHG or not
out_bandgap 1
out_mul 1
it is very hard to converge to scf_the 1e-7
, even cannot reach scf_thr 1e-6
within 5-days calculation in OMP_NUM_THREADS=16 mpirun -np 4 abacus
in Intel-8358
# After more than 700 lines of print-out and 4-days calculation
Updating EXX and rerun SCF
GE1 5.32e+00 5.81e+00 -6.437418e+03 0.000000e+00 1.291e-06 9.196e+00
GE2 5.32e+00 5.81e+00 -6.437418e+03 1.364268e-09 7.188e-07 8.863e+00
GE3 5.32e+00 5.81e+00 -6.437418e+03 1.468676e-09 3.518e-07 8.800e+00
GE4 5.32e+00 5.81e+00 -6.437418e+03 5.839128e-10 2.326e-07 8.802e+00
GE5 5.32e+00 5.81e+00 -6.437418e+03 -2.100539e-09 3.236e-08 8.843e+00
Updating EXX and rerun SCF
GE1 5.32e+00 5.81e+00 -6.437418e+03 0.000000e+00 1.058e-06 9.111e+00
GE2 5.32e+00 5.81e+00 -6.437418e+03 -1.546015e-09 5.929e-07 8.805e+00
GE3 5.32e+00 5.81e+00 -6.437418e+03 -1.840679e-10 2.948e-07 8.871e+00
GE4 5.32e+00 5.81e+00 -6.437418e+03 1.423819e-09 4.995e-08 8.820e+00
And after I saw #3103 , I add a parameter in my INPUT:
mixing_gg0 0.0
After that, convergence performance is better, in 2-days calculation of OMP_NUM_THREADS=24 mpirun -np 2 abacus
in Intel-8162, the SCF converge to scf_thr 1e-6
, but not scf_thr 1e-7
START CHARGE : atomic
DONE(177.792 SEC) : INIT SCF
ITER TMAG AMAG ETOT(eV) EDIFF(eV) DRHO TIME(s)
GE1 4.01e+00 4.01e+00 -6.440073e+03 0.000000e+00 4.826e-02 4.429e+00
GE2 4.31e+00 4.41e+00 -6.440405e+03 -3.311553e-01 1.996e-02 3.688e+00
GE3 4.33e+00 4.43e+00 -6.440409e+03 -4.691903e-03 5.726e-03 3.677e+00
GE4 4.33e+00 4.43e+00 -6.440409e+03 2.332581e-04 3.079e-03 3.684e+00
GE5 4.33e+00 4.43e+00 -6.440409e+03 -5.472160e-05 1.219e-03 3.626e+00
GE6 4.33e+00 4.43e+00 -6.440409e+03 -1.579811e-05 1.703e-04 3.681e+00
GE7 4.33e+00 4.43e+00 -6.440409e+03 -2.383246e-07 6.439e-05 3.724e+00
GE8 4.33e+00 4.43e+00 -6.440409e+03 -6.277874e-08 2.805e-05 3.635e+00
GE9 4.33e+00 4.43e+00 -6.440409e+03 -2.755682e-08 9.261e-06 3.668e+00
GE10 4.33e+00 4.43e+00 -6.440409e+03 1.987624e-10 9.984e-07 3.717e+00
GE11 4.33e+00 4.43e+00 -6.440409e+03 1.256766e-09 1.477e-07 3.667e+00
GE12 4.33e+00 4.43e+00 -6.440409e+03 -2.078884e-09 8.750e-08 3.641e+00
Updating EXX and rerun SCF
GE1 5.07e+00 5.25e+00 -6.432274e+03 0.000000e+00 6.975e-02 1.732e+01
GE2 5.12e+00 5.38e+00 -6.437178e+03 -4.903432e+00 5.335e-02 1.714e+01
GE3 5.08e+00 5.37e+00 -6.437337e+03 -1.595823e-01 2.761e-02 1.717e+01
GE4 5.08e+00 5.36e+00 -6.436762e+03 5.755460e-01 2.955e-02 1.724e+01
GE5 5.18e+00 5.45e+00 -6.437070e+03 -3.075961e-01 1.282e-02 1.730e+01
GE6 5.20e+00 5.46e+00 -6.437078e+03 -8.548606e-03 8.137e-03 1.715e+01
GE7 5.19e+00 5.45e+00 -6.437053e+03 2.523551e-02 9.021e-03 1.717e+01
GE8 5.22e+00 5.47e+00 -6.437049e+03 4.194422e-03 4.162e-03 1.725e+01
GE9 5.25e+00 5.49e+00 -6.437052e+03 -2.974158e-03 3.035e-04 1.720e+01
GE10 5.25e+00 5.49e+00 -6.437052e+03 1.164049e-05 3.154e-04 1.713e+01
GE11 5.25e+00 5.49e+00 -6.437052e+03 -1.927004e-05 9.251e-05 1.714e+01
GE12 5.25e+00 5.49e+00 -6.437052e+03 4.742927e-06 1.342e-04 1.723e+01
GE13 5.25e+00 5.49e+00 -6.437052e+03 -3.654831e-06 1.064e-04 1.724e+01
GE14 5.25e+00 5.49e+00 -6.437052e+03 -1.292602e-06 2.761e-06 1.720e+01
GE15 5.25e+00 5.49e+00 -6.437052e+03 -4.918788e-10 1.088e-06 1.726e+01
GE16 5.25e+00 5.49e+00 -6.437052e+03 -1.480277e-09 5.592e-07 1.722e+01
GE17 5.25e+00 5.49e+00 -6.437052e+03 3.408349e-09 1.277e-07 1.720e+01
GE18 5.25e+00 5.49e+00 -6.437052e+03 3.209587e-10 1.536e-08 1.722e+01
Updating EXX and rerun SCF
GE1 5.30e+00 5.66e+00 -6.437386e+03 0.000000e+00 7.783e-03 1.756e+01
GE2 5.30e+00 5.70e+00 -6.437389e+03 -2.905916e-03 3.097e-03 1.776e+01
GE3 5.30e+00 5.69e+00 -6.437389e+03 -1.426553e-04 3.709e-04 1.768e+01
GE4 5.30e+00 5.69e+00 -6.437389e+03 -3.933669e-07 1.830e-04 1.763e+01
GE5 5.30e+00 5.69e+00 -6.437389e+03 -1.916378e-07 6.337e-05 1.758e+01
GE6 5.30e+00 5.69e+00 -6.437389e+03 -5.438509e-08 6.068e-06 1.773e+01
GE7 5.30e+00 5.69e+00 -6.437389e+03 6.844540e-10 4.172e-06 1.765e+01
GE8 5.30e+00 5.69e+00 -6.437389e+03 -2.401390e-09 2.932e-06 1.760e+01
GE9 5.30e+00 5.69e+00 -6.437389e+03 -1.980663e-09 3.465e-07 1.768e+01
GE10 5.30e+00 5.69e+00 -6.437389e+03 1.095900e-09 4.516e-08 1.761e+01
Updating EXX and rerun SCF
GE1 5.30e+00 5.75e+00 -6.437412e+03 0.000000e+00 2.970e-03 1.772e+01
GE2 5.30e+00 5.77e+00 -6.437412e+03 -5.071874e-04 1.115e-03 1.761e+01
GE3 5.30e+00 5.76e+00 -6.437412e+03 -3.600643e-05 3.660e-04 1.766e+01
GE4 5.30e+00 5.76e+00 -6.437412e+03 1.002332e-06 1.333e-04 1.767e+01
GE5 5.30e+00 5.76e+00 -6.437412e+03 -3.536508e-07 3.344e-05 1.765e+01
GE6 5.30e+00 5.76e+00 -6.437412e+03 -1.065892e-08 3.677e-06 1.761e+01
GE7 5.30e+00 5.76e+00 -6.437412e+03 1.508119e-10 2.340e-06 1.777e+01
GE8 5.30e+00 5.76e+00 -6.437412e+03 -2.848412e-09 1.372e-06 1.762e+01
GE9 5.30e+00 5.76e+00 -6.437412e+03 -6.697595e-10 5.157e-07 1.766e+01
GE10 5.30e+00 5.76e+00 -6.437412e+03 2.343385e-10 2.126e-07 1.772e+01
GE11 5.30e+00 5.76e+00 -6.437412e+03 2.143849e-09 3.190e-08 1.777e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 8.249e-04 1.792e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -3.188180e-05 3.782e-04 1.772e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 -7.317703e-08 1.303e-04 1.774e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 -3.181451e-07 5.785e-05 1.770e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 1.346944e-08 1.282e-05 1.783e+01
GE6 5.29e+00 5.78e+00 -6.437418e+03 -2.061869e-09 2.488e-06 1.767e+01
GE7 5.29e+00 5.78e+00 -6.437418e+03 2.597832e-09 4.422e-07 1.771e+01
GE8 5.29e+00 5.78e+00 -6.437418e+03 3.727761e-10 1.378e-07 1.783e+01
GE9 5.29e+00 5.78e+00 -6.437418e+03 -5.916467e-10 5.191e-08 1.774e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.048e-04 1.776e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -1.439515e-06 9.945e-05 1.772e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 2.207036e-08 4.256e-05 1.779e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 1.388243e-09 1.148e-05 1.771e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -1.028305e-08 4.965e-06 1.766e+01
GE6 5.29e+00 5.78e+00 -6.437418e+03 -4.977566e-09 4.509e-07 1.777e+01
GE7 5.29e+00 5.78e+00 -6.437418e+03 8.770292e-10 1.506e-07 1.783e+01
GE8 5.29e+00 5.78e+00 -6.437418e+03 -9.288467e-10 6.491e-08 1.770e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 6.077e-05 1.769e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -1.134252e-07 3.019e-05 1.777e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 8.373541e-09 1.607e-05 1.777e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 2.111367e-09 2.498e-06 1.768e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -3.221961e-09 4.133e-07 1.770e+01
GE6 5.29e+00 5.78e+00 -6.437418e+03 4.555293e-10 1.491e-07 1.771e+01
GE7 5.29e+00 5.78e+00 -6.437418e+03 2.135342e-09 4.883e-08 1.783e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.286e-05 1.784e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -2.029078e-08 1.168e-05 1.772e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 1.047176e-09 6.660e-06 1.773e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 2.583137e-10 1.001e-06 1.789e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -1.795822e-09 4.420e-07 1.777e+01
GE6 5.29e+00 5.78e+00 -6.437418e+03 1.625675e-09 7.378e-08 1.779e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 1.176e-05 1.768e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -6.389011e-09 5.673e-06 1.767e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 1.296982e-09 3.038e-06 1.780e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 3.175557e-09 3.255e-06 1.771e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -3.668210e-09 2.879e-07 1.769e+01
GE6 5.29e+00 5.78e+00 -6.437418e+03 7.262173e-10 4.905e-08 1.772e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 6.956e-06 1.776e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -2.089712e-09 3.181e-06 1.795e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 -1.856147e-11 1.484e-06 1.772e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 4.439284e-10 7.081e-07 1.771e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -5.777256e-10 2.457e-07 1.781e+01
GE6 5.29e+00 5.78e+00 -6.437418e+03 6.627990e-10 3.775e-08 1.774e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 3.987e-06 1.776e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 5.614843e-10 1.749e-06 1.779e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 -1.237431e-11 7.742e-07 1.771e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 -1.139210e-09 9.115e-07 1.785e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 2.412990e-10 6.300e-08 1.778e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.462e-06 1.779e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 6.380504e-10 1.072e-06 1.781e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 1.345706e-09 5.480e-07 1.785e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 2.142302e-10 4.647e-07 1.776e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -2.590871e-10 4.279e-08 1.778e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 1.403e-06 1.777e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 1.235111e-09 6.003e-07 1.775e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 8.615614e-10 2.236e-07 1.787e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 -1.662025e-09 1.244e-07 1.777e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 8.360393e-10 4.124e-08 1.779e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 9.645e-07 1.775e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -1.137663e-09 6.704e-07 1.771e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 9.613292e-10 4.905e-07 1.784e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 6.372770e-10 1.410e-07 1.780e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -5.181742e-10 2.714e-08 1.778e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 5.150e-07 1.781e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -2.575403e-10 3.110e-07 1.782e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 1.438514e-09 2.384e-07 1.790e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 -8.399063e-10 7.898e-08 1.780e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 3.857e-07 1.780e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -1.633409e-09 5.688e-07 1.778e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 -1.924205e-09 1.518e-07 1.777e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 3.443925e-09 5.881e-08 1.782e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 3.686e-07 1.778e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -1.023974e-09 1.722e-07 1.784e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 2.084298e-09 7.166e-08 1.777e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.508e-07 1.841e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -7.285375e-10 5.146e-07 1.832e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 1.717709e-09 9.339e-08 1.835e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.401e-07 1.783e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 2.266046e-10 2.545e-07 1.782e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 -5.599375e-10 1.674e-07 1.791e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 -1.832945e-10 5.861e-08 1.780e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.153e-07 1.786e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -2.714614e-10 2.968e-07 1.779e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 -6.473311e-10 1.489e-07 1.779e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 2.733949e-09 4.245e-08 1.788e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.514e-07 1.780e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 1.046403e-09 2.553e-07 1.787e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 1.063417e-09 1.525e-07 1.774e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 -5.251348e-10 4.837e-08 1.787e+01
And memory consumption is 50G during calculation. Is this performance normal and proper for this system ? Can some improvements be done ?
Also. there exists some problem from user for using HSE :
- There is not any print-out in stdout and running*.log in EXX process (despite
Updateing EXX and rerun SCF
notice), which will give user a bad view that the calculation is stuck. Can more print-out information like consumed time in EXX process and some key process - How can I restart HSE SCF calculation properly if a complete SCF is not done? Because total SCF is not done, charge file will not be written, I'm trying using wavefunction file and restart file. However, due to HSE process will calculate PBE SCF first no matter
exx_separate_loop
is 0 or 1, if I directly use wfc or restart file from half-calculated HSE process, will the initialization useless because of the first PBE process ? - How can I set MPI and OMP number for best calculation performance (if memory is permitted and number of physical core is fixed)? set more OMP number will reduce memory cost, but from my observation on CPU status of HPC server during EXX process, it seems EXX process are sometimes mainly parallelized by MPI
Task list for Issue attackers (only for developers)
- [ ] Reproduce the performance issue on a similar system or environment.
- [ ] Identify the specific section of the code causing the performance issue.
- [ ] Investigate the issue and determine the root cause.
- [ ] Research best practices and potential solutions for the identified performance issue.
- [ ] Implement the chosen solution to address the performance issue.
- [ ] Test the implemented solution to ensure it improves performance without introducing new issues.
- [ ] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
- [ ] Review and incorporate any relevant feedback from users or developers.
- [ ] Merge the improved solution into the main codebase and notify the issue reporter.