abacus-develop
abacus-develop copied to clipboard
Fifty integrated test cases failed on Intel env
Describe Current Status and Possible Solution
The following 50 test cases failed on intel machine.
1: 107_PW_outWfcR
1: 107_PW_W90
1: 109_PW_CR_fix_abc
1: 111_PW_elec_add
1: 116_PW_scan_Si2
1: 116_PW_scan_Si2_nspin2
1: 127_PW_15_PK_AF
1: 150_PW_15_CR_VDW3
1: 184_PW_BNDKPAR_SDFT_ALL
1: 184_PW_BNDKPAR_SDFT_MALL
1: 201_NO_KP_DJ_CF_CS_GaAs
1: 201_NO_KP_DJ_Si
1: 208_NO_KP_CS_CR
1: 213_NO_mulliken
1: 215_NO_sol_H2
1: 216_NO_scan_Si2
1: 250_NO_KP_CR_VDW2
1: 250_NO_KP_CR_VDW3
1: 250_NO_KP_CR_VDW3ABC
1: 250_NO_KP_CR_VDW3BJ
1: 281_NO_KP_HSE
1: 282_NO_KP_HSE_complex
1: 283_NO_KP_HF
1: 284_NO_KP_PBE0
1: 285_NO_KP_RE_HSE
1: 286_NO_KP_CR_HSE
1: 283_NO_restart
1: 307_NO_GO_OH
1: 381_NO_GO_S1_HSE
1: 382_NO_GO_S2_HSE
1: 383_NO_GO_SO_HSE
1: 384_NO_GO_S1_HSE_loop0_PU
1: 385_NO_GO_RE_S1_HSE
1: 386_NO_GO_MD_S1_HSE
1: 601_NO_TDDFT_CO_occ
1: 801_PW_LT_sc
1: 802_PW_LT_fcc
1: 803_PW_LT_bcc
1: 804_PW_LT_hexagonal
1: 805_PW_LT_trigonal
1: 806_PW_LT_st
1: 807_PW_LT_bct
1: 808_PW_LT_so
1: 809_PW_LT_baco
1: 810_PW_LT_fco
1: 811_PW_LT_bco
1: 812_PW_LT_sm
1: 813_PW_LT_bacm
1: 814_PW_LT_triclinic
1: 824_NO_LT_fco
Additional Context
No response
Details can be checked in the following txt file. We need to further check each case to spot out the issues respectively. unit_test_intel.txt
@hongriTianqi I had a detailed study on the first case "107_PW_outWfcR" and found that it might be mainly a format / post-processing issue.
An integrated test invoked by Autotest.sh (1) calls abacus to run some job first, (2) uses the bash script tools/catch_properties.sh to extract & calculate some values, and (3) compare the values with those in result.ref.
In "107_PW_outWfcR", I find that abacus finish the job normally; the problem is in step-2 where tools/catch_properties.sh read OUT.Autotest/running_scf.log to get the total number of fft grid (line 256):
The variable "allgrid" is zero after line 256, which makes further calculation in line 259 yield inf. This is definitely not what we want, but it does faithfully do what command asks for. The following figure shows the place in running_scf.log from which the number is calculated:
As we can see, by using "=" and "," as delimiters, field 2 is "[ 24" instead of "24", thereby causing $2*$3*$4 to be 0.
I notice that format in running_scf.log has a recent change in https://github.com/deepmodeling/abacus-develop/pull/2605. Prior to this commit, the same information was displayed as
In this situation, $2*$3*$4 would yield the right number.
This test case can be fixed by either changing the output format of "fft grid for wave functions" back before or change the awk command in post-processing. However, I'm not able to make the desicion because each one has the potential to cause the failure of other tests. The decision should be left to people who can assess the situation and consequence of such modification. @dyzheng @hongriTianqi
- [x] Understand the problem or question described by the user.
- [x] Check if the issue is a known problem or has been addressed in the documentation.
- [x] Test the issue or problem on a similar system or environment, if possible.
- [ ] Identify the root cause or provide clarification on the user's question.
- [ ] Provide a step-by-step guide, including any necessary resources, to resolve the issue or answer the question.
- [ ] If the issue is related to documentation, update the documentation to prevent future confusion (optional).
- [ ] If the issue is related to code, consider implementing a fix or improvement (optional).
- [ ] Review and incorporate any relevant feedback from users or developers.
- [ ] Ensure the user's issue is resolved or their question is answered and close the ticket.
Update at 2023/11/01. now there are 40 failed case tests on Intel machine:
107_PW_outWfcR
[WARNING ] variance_wfc_r_0_0 cal=inf ref=0.31340000 deviation=0.31340000
##107_PW_W90 Compare Error: line 4, column 4 1: diamond.amn: 0.103213607284 1: OUT.autotest/diamond.amn: -0.103213607298 1: [WARNING ] CompareAMN_pass cal=1.00000000 ref=0.00000000 deviation=-1.00000000
109_PW_CR_fix_abc
totalstressref cal=358.02119700 ref=358.01774800 deviation=-0.00344900
111_PW_elec_add
totalstressref cal=2329.02942100 ref=2329.02954500 deviation=0.00012400
116_PW_scan_Si2
[WARNING ] etotref cal=-204.07304372 ref=-204.11142252 deviation=-0.03837880
116_PW_scan_Si2_nspin2
[WARNING ] etotref cal=-204.07300485 ref=-204.11317477 deviation=-0.04016992
127_PW_15_PK_AF
[WARNING ] etotref cal=-6141.07713468 ref=-6141.07775057 deviation=-0.00061589
150_PW_15_CR_VDW3
[WARNING ] totalforceref cal=0.86635600 ref=0.85304200 deviation=-0.01331400
184_PW_BNDKPAR_SDFT_ALL
[WARNING ] totalforceref cal=197.98036000 ref=197.98018600 deviation=-0.00017400
184_PW_BNDKPAR_SDFT_MALL
[WARNING ] totalforceref cal=197.98010600 ref=197.97993000 deviation=-0.00017600
201_NO_KP_DJ_CF_CS_GaAs
[WARNING ] totalforceref cal=144.90600000 ref=144.90617600 deviation=0.00017600
201_NO_KP_DJ_Si
[WARNING ] etotref cal=-227.56204871 ref=-227.60068424 deviation=-0.03863553
208_NO_KP_CS_CR
[WARNING ] totalstressref cal=340.58392200 ref=340.58455200 deviation=0.00063000
213_NO_mulliken
[WARNING ] totalforceref cal=4.74220400 ref=4.74258800 deviation=0.00038400
215_NO_sol_H2
[WARNING ] etotref cal=-32.74207985 ref=-32.74077066 deviation=0.00130919
216_NO_scan_Si2
[WARNING ] etotref cal=-203.91086440 ref=-203.96022423 deviation=-0.04935983
250_NO_KP_CR_VDW2
[WARNING ] etotref cal=-4262.64283604 ref=-4262.70382264 deviation=-0.06098660
250_NO_KP_CR_VDW3
[WARNING ] etotref cal=-4262.55496142 ref=-4262.61594802 deviation=-0.06098660
250_NO_KP_CR_VDW3ABC
[WARNING ] etotref cal=-447.44343208 ref=-447.51198935 deviation=-0.06855727
250_NO_KP_CR_VDW3BJ
[WARNING ] etotref cal=-4262.73795480 ref=-4262.79894140 deviation=-0.06098660
281_NO_KP_HSE
[WARNING ] totalstressref cal=1076.56558700 ref=1076.56529600 deviation=-0.00029100
283_NO_restart
[WARNING ] totalforceref cal=0.46163100 ref=0.46119300 deviation=-0.00043800
307_NO_GO_OH
[WARNING ] etotref cal=-204.59803974 ref=-204.59557974 deviation=0.00246000
601_NO_TDDFT_CO
[WARNING ] totalstressref cal=26.91672400 ref=26.91655700 deviation=-0.00016700
601_NO_TDDFT_CO_occ
[WARNING ] etotref cal=-602.93267956 ref=-602.93251189 deviation=0.00016767
801_PW_LT_sc
[WARNING ] totalstressref cal=32.28647100 ref=32.28615700 deviation=-0.00031400
802_PW_LT_fcc
[WARNING ] totalstressref cal=298.14568400 ref=298.15201200 deviation=0.00632800
803_PW_LT_bcc
[WARNING ] totalstressref cal=84.02774600 ref=84.02884300 deviation=0.00109700
804_PW_LT_hexagonal
[WARNING ] totalforceref cal=6.62548000 ref=6.62535800 deviation=-0.00012200
805_PW_LT_trigonal
[WARNING ] totalstressref cal=49.77693000 ref=49.77765100 deviation=0.00072100
806_PW_LT_st
[WARNING ] totalstressref cal=16.93875500 ref=16.93934600 deviation=0.00059100
807_PW_LT_bct
[WARNING ] totalstressref cal=33.62608300 ref=33.62753100 deviation=0.00144800
808_PW_LT_so
[WARNING ] totalforceref cal=6.79467800 ref=6.79456600 deviation=-0.00011200
809_PW_LT_baco
[WARNING ] totalforceref cal=6.79467800 ref=6.79456600 deviation=-0.00011200
810_PW_LT_fco
[WARNING ] etotref cal=-30.33829951 ref=-30.44940902 deviation=-0.11110951
811_PW_LT_bco
[WARNING ] totalforceref cal=6.54382800 ref=6.54372400 deviation=-0.00010400
812_PW_LT_sm
[WARNING ] totalstressref cal=10.81017700 ref=10.80996800 deviation=-0.00020900
813_PW_LT_bacm
[WARNING ] totalstressref cal=21.19969600 ref=21.19886200 deviation=-0.00083400
814_PW_LT_triclinic
[WARNING ] totalstressref cal=11.47930500 ref=11.47893300 deviation=-0.00037200
824_NO_LT_fco
[WARNING ] etotref cal=-31.39417040 ref=-31.69859888 deviation=-0.30442848
Aotogetwarn.txt I give a auto get warning bash script, to use it with two step. First download the integrated test log file from github, one can download some of the recent PR log file. Then unsing the bash script to get warning.
@pxlxingliang could you update the result?
see #4985