abacus-develop
abacus-develop copied to clipboard
Which direction should vacuum be put? X? Y? Z?
Details
in Tutorials of ABACUS, "vacuum should not set to z-direction, for the lattice integral is done parallelly in z-direction" is always be mentioned, BUT in practice case, set vacuum to z-direction is a formal usage for catalysis simulation.
Surely, we can use ATOMKIT to transform lattice direction now (thanks to the developer) (by redefine lattice of MS will get poor lattice in transformation). But in practice, which direction should vacuum be put?
A test below, use C6H12 Cyclo-structure adsorb to Pt-doped graphene, data is collected by abacustest collectdata
, this structure describe initial state and final state of adsorption process
in which, xy
means surface in xy-plane and vacuum in z-direction, and so on.
if use OMP_NUM_THREADS=1 mpirun -np 16 abacus
:
natom kpt ibzk nelec nbands \
Cy-Pt-graphene-FS-xy 67 [2, 2, 1] 4 246.0 148
Cy-Pt-graphene-FS-yz 67 [1, 2, 2] 4 246.0 148
Cy-Pt-graphene-FS-zx 67 [2, 1, 2] 4 246.0 148
Cy-Pt-graphene-IS-xy 67 [2, 2, 1] 4 246.0 148
Cy-Pt-graphene-IS-yz 67 [1, 2, 2] 4 246.0 148
Cy-Pt-graphene-IS-zx 67 [2, 1, 2] 4 246.0 148
force \
Cy-Pt-graphene-FS-xy [0.0058573, 0.0056758, -0.00613239, -0.0015834...
Cy-Pt-graphene-FS-yz [-0.0061323, 0.00198645, 0.00791055, 0.0087928...
Cy-Pt-graphene-FS-zx [0.00585739, -0.0061323, 0.00567572, -0.001583...
Cy-Pt-graphene-IS-xy [-0.02198308, 0.00570391, 0.02034105, 0.024721...
Cy-Pt-graphene-IS-yz [0.0188027, -0.0064566, -0.01979903, 0.0187812...
Cy-Pt-graphene-IS-zx [-0.02198309, 0.02034104, 0.00570378, 0.024721...
stress \
Cy-Pt-graphene-FS-xy [11.841762, -1.757999, -0.06442, -1.757999, -7...
Cy-Pt-graphene-FS-yz [2.478275, -0.114188, -0.140314, -0.114188, -0...
Cy-Pt-graphene-FS-zx [11.841968, -0.064421, -1.758001, -0.064421, 2...
Cy-Pt-graphene-IS-xy [14.732903, -0.0476, 0.004773, -0.0476, -10.71...
Cy-Pt-graphene-IS-yz [-0.164103, -0.02749, -0.010819, -0.02749, -4....
Cy-Pt-graphene-IS-zx [14.733115, 0.004773, -0.0476, 0.004773, -0.17...
scf_time scf_steps total_time force_time stress_time \
Cy-Pt-graphene-FS-xy 1642.050 144 1956.70 None 193.1690
Cy-Pt-graphene-FS-yz 1079.019 144 1362.80 None 139.2170
Cy-Pt-graphene-FS-zx 1079.555 144 1362.34 None 137.8680
Cy-Pt-graphene-IS-xy 1948.600 182 2254.05 None 182.1870
Cy-Pt-graphene-IS-yz 1454.214 201 1800.10 None 1.1082
Cy-Pt-graphene-IS-zx 1325.293 182 1600.90 None 1.1452
band_gap INPUT/smearing_method INPUT/smearing_sigma \
Cy-Pt-graphene-FS-xy 0.49032 gau 0.002
Cy-Pt-graphene-FS-yz 0.49032 gau 0.002
Cy-Pt-graphene-FS-zx 0.49032 gau 0.002
Cy-Pt-graphene-IS-xy 0.20259 gau 0.002
Cy-Pt-graphene-IS-yz 0.20259 gau 0.002
Cy-Pt-graphene-IS-zx 0.20258 gau 0.002
INPUT/mixing_beta
Cy-Pt-graphene-FS-xy -10
Cy-Pt-graphene-FS-yz -10
Cy-Pt-graphene-FS-zx -10
Cy-Pt-graphene-IS-xy -10
Cy-Pt-graphene-IS-yz -10
Cy-Pt-graphene-IS-zx -10
elif use OMP_NUM_THREADS=16 mpirun -np 1 abacus
natom kpt ibzk nelec nbands \
Cy-Pt-graphene-FS-xy 67 [2, 2, 1] 4 246.0 148
Cy-Pt-graphene-FS-yz 67 [1, 2, 2] 4 246.0 148
Cy-Pt-graphene-FS-zx 67 [2, 1, 2] 4 246.0 148
Cy-Pt-graphene-IS-xy 67 [2, 2, 1] 4 246.0 148
Cy-Pt-graphene-IS-yz 67 [1, 2, 2] 4 246.0 148
Cy-Pt-graphene-IS-zx 67 [2, 1, 2] 4 246.0 148
force \
Cy-Pt-graphene-FS-xy [0.00585731, 0.00567584, -0.00613238, -0.00158...
Cy-Pt-graphene-FS-yz [-0.00613231, 0.0019864, 0.00791057, 0.0087928...
Cy-Pt-graphene-FS-zx [0.00585716, -0.00613224, 0.00567584, -0.00158...
Cy-Pt-graphene-IS-xy [-0.02150663, 0.00564123, 0.02026577, 0.024211...
Cy-Pt-graphene-IS-yz [0.01886961, -0.00649041, -0.01966561, 0.01884...
Cy-Pt-graphene-IS-zx [-0.02149286, 0.02037799, 0.00560933, 0.024198...
stress \
Cy-Pt-graphene-FS-xy [11.841762, -1.757998, -0.06442, -1.757998, -7...
Cy-Pt-graphene-FS-yz [2.478275, -0.114188, -0.140314, -0.114188, -0...
Cy-Pt-graphene-FS-zx [11.841966, -0.064421, -1.758, -0.064421, 2.47...
Cy-Pt-graphene-IS-xy [14.73432, -0.047485, 0.004764, -0.047485, -10...
Cy-Pt-graphene-IS-yz [-0.165089, -0.027494, -0.010818, -0.027494, -...
Cy-Pt-graphene-IS-zx [14.7353, 0.004761, -0.047506, 0.004761, -0.18...
scf_time scf_steps total_time force_time stress_time \
Cy-Pt-graphene-FS-xy 986.825 144 1189.02 None 100.1840
Cy-Pt-graphene-FS-yz 1608.619 144 1869.74 None 159.2260
Cy-Pt-graphene-FS-zx 1073.578 144 1267.59 None 92.0163
Cy-Pt-graphene-IS-xy 1428.172 214 1624.82 None 94.3373
Cy-Pt-graphene-IS-yz 3403.074 322 3715.36 None 184.8030
Cy-Pt-graphene-IS-zx 1597.002 223 1788.65 None 88.9116
band_gap INPUT/smearing_method INPUT/smearing_sigma \
Cy-Pt-graphene-FS-xy 0.49032 gau 0.002
Cy-Pt-graphene-FS-yz 0.49032 gau 0.002
Cy-Pt-graphene-FS-zx 0.49032 gau 0.002
Cy-Pt-graphene-IS-xy 0.20259 gau 0.002
Cy-Pt-graphene-IS-yz 0.20259 gau 0.002
Cy-Pt-graphene-IS-zx 0.20259 gau 0.002
INPUT/mixing_beta
Cy-Pt-graphene-FS-xy -10
Cy-Pt-graphene-FS-yz -10
Cy-Pt-graphene-FS-zx -10
Cy-Pt-graphene-IS-xy -10
Cy-Pt-graphene-IS-yz -10
Cy-Pt-graphene-IS-zx -10
It's not hard to find that:
- if run by MPI16, z-dir vacuum have the lowest performance, which is consistent to ABACUS guide, also, it makes sure that the lattice integral is parallelly done by MPI
- if run by OMP16, z-dir vacuum have the BEST performance, which is NOT consistent to ABACUS guide, And, OMP16 have more performance over
MPI16
MPI4-OMP4 strategy is in test now.
In other surfaces test (which I will test again if I have time), z-direction vacuum (xy-example) also have the best performance in OMP16 strategy.
More test is needed, or the lattice integral algorism should be improved.
Other information:
- ABACUS version: 3.3.4
- ABACUS dependency: by intel-OneAPI toolchain, compiled by
icpc
- Device: AMD 3950X
- Environments: WSL2 in Windows10, WSL2 is Ubuntu 20.04
Task list for Issue attackers
- [X] Reproduce the performance issue on a similar system or environment.
- [X] Identify the specific section of the code causing the performance issue.
- [X] Investigate the issue and determine the root cause.
- [X] Research best practices and potential solutions for the identified performance issue.
- [x] Implement the chosen solution to address the performance issue.
- [x] Test the implemented solution to ensure it improves performance without introducing new issues.
- [x] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
- [ ] Review and incorporate any relevant feedback from users or developers.
- [ ] Merge the improved solution into the main codebase and notify the issue reporter.
run by MPI4-OMP4, namely OMP_NUM_THREADS=4 mpirun -np 4 abacus
results:
natom kpt ibzk nelec nbands \
Cy-Pt-graphene-FS-xy 67 [2, 2, 1] 4 246.0 148
Cy-Pt-graphene-FS-yz 67 [1, 2, 2] 4 246.0 148
Cy-Pt-graphene-FS-zx 67 [2, 1, 2] 4 246.0 148
Cy-Pt-graphene-IS-xy 67 [2, 2, 1] 4 246.0 148
Cy-Pt-graphene-IS-yz 67 [1, 2, 2] 4 246.0 148
Cy-Pt-graphene-IS-zx 67 [2, 1, 2] 4 246.0 148
force \
Cy-Pt-graphene-FS-xy [0.00585724, 0.00567584, -0.00613238, -0.00158...
Cy-Pt-graphene-FS-yz [-0.0061323, 0.00198643, 0.00791055, 0.0087928...
Cy-Pt-graphene-FS-zx [0.0058574, -0.00613231, 0.00567572, -0.001583...
Cy-Pt-graphene-IS-xy [-0.02165887, 0.00566376, 0.02029126, 0.024372...
Cy-Pt-graphene-IS-yz [0.01889752, -0.00637754, -0.01982845, 0.01887...
Cy-Pt-graphene-IS-zx [-0.02176209, 0.02026247, 0.00564634, 0.024480...
stress \
Cy-Pt-graphene-FS-xy [11.841762, -1.757998, -0.06442, -1.757998, -7...
Cy-Pt-graphene-FS-yz [2.478274, -0.114188, -0.140314, -0.114188, -0...
Cy-Pt-graphene-FS-zx [11.841968, -0.064421, -1.758001, -0.064421, 2...
Cy-Pt-graphene-IS-xy [14.733796, -0.047504, 0.00477, -0.047504, -10...
Cy-Pt-graphene-IS-yz [-0.164758, -0.027515, -0.010838, -0.027515, -...
Cy-Pt-graphene-IS-zx [14.733666, 0.004771, -0.047556, 0.004771, -0....
scf_time scf_steps total_time force_time stress_time \
Cy-Pt-graphene-FS-xy 1271.086 144 1523.12 None 143.080
Cy-Pt-graphene-FS-yz 1731.420 144 2020.19 None 162.857
Cy-Pt-graphene-FS-zx 1325.112 144 1579.88 None 128.330
Cy-Pt-graphene-IS-xy 1933.440 235 2172.39 None 128.559
Cy-Pt-graphene-IS-yz 2981.194 280 3322.56 None 183.681
Cy-Pt-graphene-IS-zx 2046.399 231 2300.05 None 127.007
band_gap INPUT/smearing_method INPUT/smearing_sigma \
Cy-Pt-graphene-FS-xy 0.49032 gau 0.002
Cy-Pt-graphene-FS-yz 0.49032 gau 0.002
Cy-Pt-graphene-FS-zx 0.49032 gau 0.002
Cy-Pt-graphene-IS-xy 0.20259 gau 0.002
Cy-Pt-graphene-IS-yz 0.20259 gau 0.002
Cy-Pt-graphene-IS-zx 0.20259 gau 0.002
INPUT/mixing_beta
Cy-Pt-graphene-FS-xy -10
Cy-Pt-graphene-FS-yz -10
Cy-Pt-graphene-FS-zx -10
Cy-Pt-graphene-IS-xy -10
Cy-Pt-graphene-IS-yz -10
Cy-Pt-graphene-IS-zx -10
More interesting.
Test the example above in ABACUS v3.4.0
run by MPI16 https://labs.dp.tech/projects/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-xyztest-mpi-340-abacustest-v0.3.40-37b1fc
run by OMP16: https://labs.dp.tech/projects/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-xyztest-omp-340-abacustest-v0.3.40-e95099
run by MPI4-OMP4 https://labs.dp.tech/projects/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-xyztest-mpiomp-340-abacustest-v0.3.40-9fcc90
Conclusion:
- Lattice integral seems to be parallelized in z-axis by MPI, so if run by MPI (especially use large number of cores or nodes), the vacuum SHOULD NOT be set along z-direction
- OpenMP parallelization seems to be along x-direction, so if run by OMP. xy-plane surface will have good performance
Also, here test another system in https://github.com/deepmodeling/abacus-develop/issues/2889, when using xy-plane:
Use Systematically Improvable Atomic bases
---------------------------------------------------------
ELEMENT ORBITALS NBASE NATOM XC
H 2s1p-6au 5 2
C 2s2p1d-7au 13 37
O 2s2p1d-7au 13 1
Fe 4s2p2d1f-8au 27 80
---------------------------------------------------------
Initial plane wave basis and FFT box
---------------------------------------------------------
DONE(0.252691 SEC) : INIT PLANEWAVE
-------------------------------------------
STEP OF ION RELAXATION : 1
-------------------------------------------
START CHARGE : atomic
DONE(2.6286 SEC) : INIT SCF
ITER TMAG AMAG ETOT(eV) EDIFF(eV) DRHO TIME(s)
GE1 1.30e+02 1.36e+02 -2.636373e+05 0.000000e+00 8.669e-02 1.996e+01
GE2 8.03e+01 8.55e+01 -2.636549e+05 -1.761770e+01 1.552e-01 1.742e+01
GE3 1.31e+02 1.39e+02 -2.637876e+05 -1.327641e+02 6.050e-02 1.749e+01
GE4 1.39e+02 1.47e+02 -2.638028e+05 -1.518177e+01 5.103e-02 1.740e+01
GE5 1.39e+02 1.48e+02 -2.638058e+05 -3.011335e+00 3.696e-02 1.743e+01
GE6 1.39e+02 1.48e+02 -2.638040e+05 1.794449e+00 3.384e-02 1.739e+01
GE7 1.41e+02 1.50e+02 -2.638065e+05 -2.440206e+00 2.423e-02 1.737e+01
GE8 1.43e+02 1.53e+02 -2.638074e+05 -9.270501e-01 1.923e-02 1.739e+01
GE9 1.43e+02 1.53e+02 -2.638079e+05 -4.920444e-01 1.698e-02 1.736e+01
GE10 1.43e+02 1.53e+02 -2.638076e+05 2.842442e-01 1.704e-02 1.735e+01
GE11 1.44e+02 1.55e+02 -2.638080e+05 -3.599622e-01 1.517e-02 1.736e+01
GE12 1.44e+02 1.55e+02 -2.638081e+05 -1.185213e-01 1.484e-02 1.731e+01
GE13 1.44e+02 1.55e+02 -2.638086e+05 -5.031615e-01 1.170e-02 1.736e+01
GE14 1.45e+02 1.57e+02 -2.638088e+05 -1.602977e-01 8.759e-03 1.732e+01
GE15 1.45e+02 1.57e+02 -2.638088e+05 -8.408801e-02 7.838e-03 1.737e+01
GE16 1.45e+02 1.58e+02 -2.638089e+05 -4.257580e-02 6.828e-03 1.735e+01
GE17 1.45e+02 1.58e+02 -2.638089e+05 -2.479346e-02 6.255e-03 1.734e+01
GE18 1.45e+02 1.58e+02 -2.638089e+05 -2.366145e-02 5.812e-03 1.732e+01
GE19 1.45e+02 1.58e+02 -2.638090e+05 -1.738967e-02 5.561e-03 1.735e+01
GE20 1.45e+02 1.58e+02 -2.638090e+05 -3.612013e-02 5.325e-03 1.735e+01
GE21 1.45e+02 1.58e+02 -2.638090e+05 -1.479388e-02 4.814e-03 1.732e+01
GE22 1.45e+02 1.58e+02 -2.638090e+05 -1.846934e-02 4.648e-03 1.732e+01
GE23 1.45e+02 1.58e+02 -2.638090e+05 -8.299799e-03 4.368e-03 1.736e+01
GE24 1.45e+02 1.58e+02 -2.638090e+05 -1.306855e-02 4.190e-03 1.731e+01
GE25 1.45e+02 1.58e+02 -2.638091e+05 -1.000764e-02 3.852e-03 1.732e+01
GE26 1.45e+02 1.58e+02 -2.638091e+05 -1.011636e-02 3.637e-03 1.735e+01
GE27 1.45e+02 1.58e+02 -2.638091e+05 -7.052322e-03 3.351e-03 1.733e+01
And when using zx-plane surface: (rotated by ATOMKIT 405 function)
Use Systematically Improvable Atomic bases
---------------------------------------------------------
ELEMENT ORBITALS NBASE NATOM XC
H 2s1p-6au 5 2
C 2s2p1d-7au 13 37
O 2s2p1d-7au 13 1
Fe 4s2p2d1f-8au 27 80
---------------------------------------------------------
Initial plane wave basis and FFT box
---------------------------------------------------------
DONE(0.279017 SEC) : INIT PLANEWAVE
-------------------------------------------
STEP OF ION RELAXATION : 1
-------------------------------------------
START CHARGE : atomic
DONE(3.18291 SEC) : INIT SCF
ITER TMAG AMAG ETOT(eV) EDIFF(eV) DRHO TIME(s)
GE1 1.43e+02 1.50e+02 -2.638006e+05 0.000000e+00 6.895e-02 1.414e+01
GE2 1.38e+02 1.45e+02 -2.637958e+05 4.742638e+00 5.222e-02 1.167e+01
GE3 1.39e+02 1.48e+02 -2.637953e+05 5.667900e-01 5.344e-02 1.167e+01
GE4 1.40e+02 1.49e+02 -2.638018e+05 -6.494956e+00 3.249e-02 1.165e+01
GE5 1.38e+02 1.48e+02 -2.637987e+05 3.043678e+00 3.629e-02 1.163e+01
GE6 1.43e+02 1.53e+02 -2.638059e+05 -7.210835e+00 2.185e-02 1.164e+01
GE7 1.44e+02 1.54e+02 -2.638079e+05 -2.014826e+00 1.777e-02 1.164e+01
GE8 1.44e+02 1.55e+02 -2.638082e+05 -2.418029e-01 1.586e-02 1.163e+01
GE9 1.45e+02 1.55e+02 -2.638084e+05 -2.235844e-01 1.469e-02 1.167e+01
GE10 1.45e+02 1.57e+02 -2.638086e+05 -2.368500e-01 1.147e-02 1.163e+01
GE11 1.46e+02 1.57e+02 -2.638087e+05 -8.720691e-02 1.008e-02 1.163e+01
GE12 1.46e+02 1.58e+02 -2.638088e+05 -1.049885e-01 8.905e-03 1.164e+01
GE13 1.46e+02 1.58e+02 -2.638089e+05 -6.210490e-02 7.262e-03 1.164e+01
GE14 1.46e+02 1.58e+02 -2.638089e+05 -5.225872e-02 6.358e-03 1.162e+01
GE15 1.46e+02 1.58e+02 -2.638090e+05 -5.103418e-02 6.065e-03 1.163e+01
GE16 1.46e+02 1.59e+02 -2.638091e+05 -5.165342e-02 5.507e-03 1.162e+01
GE17 1.46e+02 1.59e+02 -2.638091e+05 -3.447109e-02 5.241e-03 1.160e+01
GE18 1.46e+02 1.58e+02 -2.638091e+05 -4.689723e-02 4.846e-03 1.161e+01
GE19 1.45e+02 1.58e+02 -2.638092e+05 -2.564418e-02 4.447e-03 1.167e+01
GE20 1.45e+02 1.58e+02 -2.638092e+05 -1.696622e-02 4.179e-03 1.161e+01
GE21 1.45e+02 1.58e+02 -2.638092e+05 -1.030517e-02 3.993e-03 1.160e+01
GE22 1.45e+02 1.58e+02 -2.638092e+05 -2.090657e-02 3.867e-03 1.159e+01
GE23 1.45e+02 1.58e+02 -2.638092e+05 -1.887464e-02 3.575e-03 1.159e+01
GE24 1.45e+02 1.58e+02 -2.638092e+05 -9.327050e-03 3.309e-03 1.161e+01
GE25 1.44e+02 1.57e+02 -2.638092e+05 -1.332586e-02 3.219e-03 1.159e+01
GE26 1.44e+02 1.57e+02 -2.638093e+05 -7.326914e-03 3.018e-03 1.157e+01
GE27 1.44e+02 1.57e+02 -2.638093e+05 -1.357772e-02 2.934e-03 1.157e+01
which are run by ABACUS 3.4.0, icx-mkl toolchain compiled, run by OMP_NUM_THREADS=4 mpirun -np 16 abacus
Summary: surface is better in zx-plane, vacuum better along y-axis.
@ieiue Any updates ? I consider that a systematic test is needed
Newest report about this issue: https://bohrium.dp.tech/jobs/app-detail/29905?type=App