abacus-develop icon indicating copy to clipboard operation
abacus-develop copied to clipboard

Request: Better convergence of HSE in magnetic system

Open QuantumMisaka opened this issue 1 year ago • 15 comments

Details

I've tested HSE SCF in magnetic system, example is Fe-bcc conventional cell:

ATOMIC_SPECIES
Fe 55.845 Fe_ONCV_PBE-1.0.upf upf201

NUMERICAL_ORBITAL
Fe_gga_8au_100Ry_4s2p2d1f.orb

LATTICE_CONSTANT
1.889726

LATTICE_VECTORS
    2.8301511117     0.0000000000     0.0000000000 #latvec1
    0.0000000000     2.8301511117     0.0000000000 #latvec2
    0.0000000000    -0.0000000000     2.8301511117 #latvec3

ATOMIC_POSITIONS
Direct

Fe #label
2 #magnetism
2 #number of atoms
    0.0000000000     0.0000000000     0.0000000000 m  1  1  1
    0.5000000000     0.5000000000     0.5000000000 m  1  1  1

And KPT is 9 9 9

Fe-HSE.tar.gz

Information: ABACUS version: 3.4.4: Commit: 5f9d472 (Mon Dec 4 14:10:21 2023 +0800) Dependence: Intel-OneAPI and Intel-toolchain LibRI and LibComm: latest version before Nov 18

At first, my INPUT example is

#Parameters (1.General)
suffix                  Fe  # suffix of OUTPUT DIR
nspin                   2   #  1/2/4 4 for SOC
symmetry                0   #  0/1  1 for open, default
esolver_type            ksdft  # ksdft, ofdft, sdft, tddft, lj, dp
dft_functional          hse  # same as upf file, can be lda/pbe/scan/hf/pbe0/hse
ks_solver             genelpa  # default for ksdft-lcao
vdw_method              none  # none, d3, d3_bj

#Parameters (2.Iteration)
calculation             scf # scf relax cell-relax md
ecutwfc                 100
scf_thr                 1e-7
scf_nmax                300

#Parameters (3.Basis)
basis_type              lcao  # lcao or pw

#Parameters (4.Smearing)
smearing_method         mp    # mp/gaussian/fixed
smearing_sigma          0.002  # Rydberg

#Parameters (5.Mixing)
mixing_type             broyden  # pulay/broyden

#Parameters (6.Calculation)
cal_force          1
cal_stress         1
out_stru           1  # print STRU in OUT
out_chg            1  # print CHG or not
out_bandgap        1
out_mul            1  

it is very hard to converge to scf_the 1e-7, even cannot reach scf_thr 1e-6 within 5-days calculation in OMP_NUM_THREADS=16 mpirun -np 4 abacus in Intel-8358

# After more than 700 lines of print-out and 4-days calculation
 Updating EXX and rerun SCF
 GE1    5.32e+00  5.81e+00  -6.437418e+03  0.000000e+00   1.291e-06  9.196e+00  
 GE2    5.32e+00  5.81e+00  -6.437418e+03  1.364268e-09   7.188e-07  8.863e+00  
 GE3    5.32e+00  5.81e+00  -6.437418e+03  1.468676e-09   3.518e-07  8.800e+00  
 GE4    5.32e+00  5.81e+00  -6.437418e+03  5.839128e-10   2.326e-07  8.802e+00  
 GE5    5.32e+00  5.81e+00  -6.437418e+03  -2.100539e-09  3.236e-08  8.843e+00  
 Updating EXX and rerun SCF
 GE1    5.32e+00  5.81e+00  -6.437418e+03  0.000000e+00   1.058e-06  9.111e+00  
 GE2    5.32e+00  5.81e+00  -6.437418e+03  -1.546015e-09  5.929e-07  8.805e+00  
 GE3    5.32e+00  5.81e+00  -6.437418e+03  -1.840679e-10  2.948e-07  8.871e+00  
 GE4    5.32e+00  5.81e+00  -6.437418e+03  1.423819e-09   4.995e-08  8.820e+00  

And after I saw #3103 , I add a parameter in my INPUT:

mixing_gg0   0.0

After that, convergence performance is better, in 2-days calculation of OMP_NUM_THREADS=24 mpirun -np 2 abacus in Intel-8162, the SCF converge to scf_thr 1e-6, but not scf_thr 1e-7

 START CHARGE      : atomic
 DONE(177.792    SEC) : INIT SCF
 ITER   TMAG      AMAG      ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)    
 GE1    4.01e+00  4.01e+00  -6.440073e+03  0.000000e+00   4.826e-02  4.429e+00  
 GE2    4.31e+00  4.41e+00  -6.440405e+03  -3.311553e-01  1.996e-02  3.688e+00  
 GE3    4.33e+00  4.43e+00  -6.440409e+03  -4.691903e-03  5.726e-03  3.677e+00  
 GE4    4.33e+00  4.43e+00  -6.440409e+03  2.332581e-04   3.079e-03  3.684e+00  
 GE5    4.33e+00  4.43e+00  -6.440409e+03  -5.472160e-05  1.219e-03  3.626e+00  
 GE6    4.33e+00  4.43e+00  -6.440409e+03  -1.579811e-05  1.703e-04  3.681e+00  
 GE7    4.33e+00  4.43e+00  -6.440409e+03  -2.383246e-07  6.439e-05  3.724e+00  
 GE8    4.33e+00  4.43e+00  -6.440409e+03  -6.277874e-08  2.805e-05  3.635e+00  
 GE9    4.33e+00  4.43e+00  -6.440409e+03  -2.755682e-08  9.261e-06  3.668e+00  
 GE10   4.33e+00  4.43e+00  -6.440409e+03  1.987624e-10   9.984e-07  3.717e+00  
 GE11   4.33e+00  4.43e+00  -6.440409e+03  1.256766e-09   1.477e-07  3.667e+00  
 GE12   4.33e+00  4.43e+00  -6.440409e+03  -2.078884e-09  8.750e-08  3.641e+00  
 Updating EXX and rerun SCF
 GE1    5.07e+00  5.25e+00  -6.432274e+03  0.000000e+00   6.975e-02  1.732e+01  
 GE2    5.12e+00  5.38e+00  -6.437178e+03  -4.903432e+00  5.335e-02  1.714e+01  
 GE3    5.08e+00  5.37e+00  -6.437337e+03  -1.595823e-01  2.761e-02  1.717e+01  
 GE4    5.08e+00  5.36e+00  -6.436762e+03  5.755460e-01   2.955e-02  1.724e+01  
 GE5    5.18e+00  5.45e+00  -6.437070e+03  -3.075961e-01  1.282e-02  1.730e+01  
 GE6    5.20e+00  5.46e+00  -6.437078e+03  -8.548606e-03  8.137e-03  1.715e+01  
 GE7    5.19e+00  5.45e+00  -6.437053e+03  2.523551e-02   9.021e-03  1.717e+01  
 GE8    5.22e+00  5.47e+00  -6.437049e+03  4.194422e-03   4.162e-03  1.725e+01  
 GE9    5.25e+00  5.49e+00  -6.437052e+03  -2.974158e-03  3.035e-04  1.720e+01  
 GE10   5.25e+00  5.49e+00  -6.437052e+03  1.164049e-05   3.154e-04  1.713e+01  
 GE11   5.25e+00  5.49e+00  -6.437052e+03  -1.927004e-05  9.251e-05  1.714e+01  
 GE12   5.25e+00  5.49e+00  -6.437052e+03  4.742927e-06   1.342e-04  1.723e+01  
 GE13   5.25e+00  5.49e+00  -6.437052e+03  -3.654831e-06  1.064e-04  1.724e+01  
 GE14   5.25e+00  5.49e+00  -6.437052e+03  -1.292602e-06  2.761e-06  1.720e+01  
 GE15   5.25e+00  5.49e+00  -6.437052e+03  -4.918788e-10  1.088e-06  1.726e+01  
 GE16   5.25e+00  5.49e+00  -6.437052e+03  -1.480277e-09  5.592e-07  1.722e+01  
 GE17   5.25e+00  5.49e+00  -6.437052e+03  3.408349e-09   1.277e-07  1.720e+01  
 GE18   5.25e+00  5.49e+00  -6.437052e+03  3.209587e-10   1.536e-08  1.722e+01  
 Updating EXX and rerun SCF
 GE1    5.30e+00  5.66e+00  -6.437386e+03  0.000000e+00   7.783e-03  1.756e+01  
 GE2    5.30e+00  5.70e+00  -6.437389e+03  -2.905916e-03  3.097e-03  1.776e+01  
 GE3    5.30e+00  5.69e+00  -6.437389e+03  -1.426553e-04  3.709e-04  1.768e+01  
 GE4    5.30e+00  5.69e+00  -6.437389e+03  -3.933669e-07  1.830e-04  1.763e+01  
 GE5    5.30e+00  5.69e+00  -6.437389e+03  -1.916378e-07  6.337e-05  1.758e+01  
 GE6    5.30e+00  5.69e+00  -6.437389e+03  -5.438509e-08  6.068e-06  1.773e+01  
 GE7    5.30e+00  5.69e+00  -6.437389e+03  6.844540e-10   4.172e-06  1.765e+01  
 GE8    5.30e+00  5.69e+00  -6.437389e+03  -2.401390e-09  2.932e-06  1.760e+01  
 GE9    5.30e+00  5.69e+00  -6.437389e+03  -1.980663e-09  3.465e-07  1.768e+01  
 GE10   5.30e+00  5.69e+00  -6.437389e+03  1.095900e-09   4.516e-08  1.761e+01  
 Updating EXX and rerun SCF
 GE1    5.30e+00  5.75e+00  -6.437412e+03  0.000000e+00   2.970e-03  1.772e+01  
 GE2    5.30e+00  5.77e+00  -6.437412e+03  -5.071874e-04  1.115e-03  1.761e+01  
 GE3    5.30e+00  5.76e+00  -6.437412e+03  -3.600643e-05  3.660e-04  1.766e+01  
 GE4    5.30e+00  5.76e+00  -6.437412e+03  1.002332e-06   1.333e-04  1.767e+01  
 GE5    5.30e+00  5.76e+00  -6.437412e+03  -3.536508e-07  3.344e-05  1.765e+01  
 GE6    5.30e+00  5.76e+00  -6.437412e+03  -1.065892e-08  3.677e-06  1.761e+01  
 GE7    5.30e+00  5.76e+00  -6.437412e+03  1.508119e-10   2.340e-06  1.777e+01  
 GE8    5.30e+00  5.76e+00  -6.437412e+03  -2.848412e-09  1.372e-06  1.762e+01  
 GE9    5.30e+00  5.76e+00  -6.437412e+03  -6.697595e-10  5.157e-07  1.766e+01  
 GE10   5.30e+00  5.76e+00  -6.437412e+03  2.343385e-10   2.126e-07  1.772e+01  
 GE11   5.30e+00  5.76e+00  -6.437412e+03  2.143849e-09   3.190e-08  1.777e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   8.249e-04  1.792e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  -3.188180e-05  3.782e-04  1.772e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  -7.317703e-08  1.303e-04  1.774e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  -3.181451e-07  5.785e-05  1.770e+01  
 GE5    5.29e+00  5.78e+00  -6.437418e+03  1.346944e-08   1.282e-05  1.783e+01  
 GE6    5.29e+00  5.78e+00  -6.437418e+03  -2.061869e-09  2.488e-06  1.767e+01  
 GE7    5.29e+00  5.78e+00  -6.437418e+03  2.597832e-09   4.422e-07  1.771e+01  
 GE8    5.29e+00  5.78e+00  -6.437418e+03  3.727761e-10   1.378e-07  1.783e+01  
 GE9    5.29e+00  5.78e+00  -6.437418e+03  -5.916467e-10  5.191e-08  1.774e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   2.048e-04  1.776e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  -1.439515e-06  9.945e-05  1.772e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  2.207036e-08   4.256e-05  1.779e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  1.388243e-09   1.148e-05  1.771e+01  
 GE5    5.29e+00  5.78e+00  -6.437418e+03  -1.028305e-08  4.965e-06  1.766e+01  
 GE6    5.29e+00  5.78e+00  -6.437418e+03  -4.977566e-09  4.509e-07  1.777e+01  
 GE7    5.29e+00  5.78e+00  -6.437418e+03  8.770292e-10   1.506e-07  1.783e+01  
 GE8    5.29e+00  5.78e+00  -6.437418e+03  -9.288467e-10  6.491e-08  1.770e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   6.077e-05  1.769e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  -1.134252e-07  3.019e-05  1.777e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  8.373541e-09   1.607e-05  1.777e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  2.111367e-09   2.498e-06  1.768e+01  
 GE5    5.29e+00  5.78e+00  -6.437418e+03  -3.221961e-09  4.133e-07  1.770e+01  
 GE6    5.29e+00  5.78e+00  -6.437418e+03  4.555293e-10   1.491e-07  1.771e+01  
 GE7    5.29e+00  5.78e+00  -6.437418e+03  2.135342e-09   4.883e-08  1.783e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   2.286e-05  1.784e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  -2.029078e-08  1.168e-05  1.772e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  1.047176e-09   6.660e-06  1.773e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  2.583137e-10   1.001e-06  1.789e+01  
 GE5    5.29e+00  5.78e+00  -6.437418e+03  -1.795822e-09  4.420e-07  1.777e+01  
 GE6    5.29e+00  5.78e+00  -6.437418e+03  1.625675e-09   7.378e-08  1.779e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   1.176e-05  1.768e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  -6.389011e-09  5.673e-06  1.767e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  1.296982e-09   3.038e-06  1.780e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  3.175557e-09   3.255e-06  1.771e+01  
 GE5    5.29e+00  5.78e+00  -6.437418e+03  -3.668210e-09  2.879e-07  1.769e+01  
 GE6    5.29e+00  5.78e+00  -6.437418e+03  7.262173e-10   4.905e-08  1.772e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   6.956e-06  1.776e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  -2.089712e-09  3.181e-06  1.795e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  -1.856147e-11  1.484e-06  1.772e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  4.439284e-10   7.081e-07  1.771e+01  
 GE5    5.29e+00  5.78e+00  -6.437418e+03  -5.777256e-10  2.457e-07  1.781e+01  
 GE6    5.29e+00  5.78e+00  -6.437418e+03  6.627990e-10   3.775e-08  1.774e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   3.987e-06  1.776e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  5.614843e-10   1.749e-06  1.779e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  -1.237431e-11  7.742e-07  1.771e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  -1.139210e-09  9.115e-07  1.785e+01  
 GE5    5.29e+00  5.78e+00  -6.437418e+03  2.412990e-10   6.300e-08  1.778e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   2.462e-06  1.779e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  6.380504e-10   1.072e-06  1.781e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  1.345706e-09   5.480e-07  1.785e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  2.142302e-10   4.647e-07  1.776e+01  
 GE5    5.29e+00  5.78e+00  -6.437418e+03  -2.590871e-10  4.279e-08  1.778e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   1.403e-06  1.777e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  1.235111e-09   6.003e-07  1.775e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  8.615614e-10   2.236e-07  1.787e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  -1.662025e-09  1.244e-07  1.777e+01  
 GE5    5.29e+00  5.78e+00  -6.437418e+03  8.360393e-10   4.124e-08  1.779e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   9.645e-07  1.775e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  -1.137663e-09  6.704e-07  1.771e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  9.613292e-10   4.905e-07  1.784e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  6.372770e-10   1.410e-07  1.780e+01  
 GE5    5.29e+00  5.78e+00  -6.437418e+03  -5.181742e-10  2.714e-08  1.778e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   5.150e-07  1.781e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  -2.575403e-10  3.110e-07  1.782e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  1.438514e-09   2.384e-07  1.790e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  -8.399063e-10  7.898e-08  1.780e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   3.857e-07  1.780e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  -1.633409e-09  5.688e-07  1.778e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  -1.924205e-09  1.518e-07  1.777e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  3.443925e-09   5.881e-08  1.782e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   3.686e-07  1.778e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  -1.023974e-09  1.722e-07  1.784e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  2.084298e-09   7.166e-08  1.777e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   2.508e-07  1.841e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  -7.285375e-10  5.146e-07  1.832e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  1.717709e-09   9.339e-08  1.835e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   2.401e-07  1.783e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  2.266046e-10   2.545e-07  1.782e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  -5.599375e-10  1.674e-07  1.791e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  -1.832945e-10  5.861e-08  1.780e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   2.153e-07  1.786e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  -2.714614e-10  2.968e-07  1.779e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  -6.473311e-10  1.489e-07  1.779e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  2.733949e-09   4.245e-08  1.788e+01  
 Updating EXX and rerun SCF
 GE1    5.29e+00  5.78e+00  -6.437418e+03  0.000000e+00   2.514e-07  1.780e+01  
 GE2    5.29e+00  5.78e+00  -6.437418e+03  1.046403e-09   2.553e-07  1.787e+01  
 GE3    5.29e+00  5.78e+00  -6.437418e+03  1.063417e-09   1.525e-07  1.774e+01  
 GE4    5.29e+00  5.78e+00  -6.437418e+03  -5.251348e-10  4.837e-08  1.787e+01  

And memory consumption is 50G during calculation. Is this performance normal and proper for this system ? Can some improvements be done ?

Also. there exists some problem from user for using HSE :

  1. There is not any print-out in stdout and running*.log in EXX process (despite Updateing EXX and rerun SCF notice), which will give user a bad view that the calculation is stuck. Can more print-out information like consumed time in EXX process and some key process
  2. How can I restart HSE SCF calculation properly if a complete SCF is not done? Because total SCF is not done, charge file will not be written, I'm trying using wavefunction file and restart file. However, due to HSE process will calculate PBE SCF first no matter exx_separate_loop is 0 or 1, if I directly use wfc or restart file from half-calculated HSE process, will the initialization useless because of the first PBE process ?
  3. How can I set MPI and OMP number for best calculation performance (if memory is permitted and number of physical core is fixed)? set more OMP number will reduce memory cost, but from my observation on CPU status of HPC server during EXX process, it seems EXX process are sometimes mainly parallelized by MPI

Task list for Issue attackers (only for developers)

  • [ ] Reproduce the performance issue on a similar system or environment.
  • [ ] Identify the specific section of the code causing the performance issue.
  • [ ] Investigate the issue and determine the root cause.
  • [ ] Research best practices and potential solutions for the identified performance issue.
  • [ ] Implement the chosen solution to address the performance issue.
  • [ ] Test the implemented solution to ensure it improves performance without introducing new issues.
  • [ ] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
  • [ ] Review and incorporate any relevant feedback from users or developers.
  • [ ] Merge the improved solution into the main codebase and notify the issue reporter.

QuantumMisaka avatar Dec 14 '23 12:12 QuantumMisaka