Issue related to "BLOCK_TO_CHUNK_SEND_PTERS" in phys_grid.F90 of EAM
Among 57 e3sm_develper test cases, 21 tests are passed and 36 failed.
Among 36 failed tests, 9 are failed with this issue. The failed test cases are:
ERP_Ln18.ne4_oQU240.F2010.crusher_crayclang ERS.ne4_oQU240.F2010.crusher_crayclang.eam-hommexx ERS_Ld3.ne4pg2_oQU480.F2010.crusher_crayclang.eam-thetahy_sl_pg2 ERS_Ld3.ne4pg2_oQU480.F2010.crusher_crayclang.eam-thetahy_sl_pg2_ftype0 SMS_Ln5.ne4pg2_oQU480.F2010.crusher_crayclang SMS_Ln5.ne4pg2_oQU480.F2010.crusher_crayclang.eam-thetahy_pg2 SMS_Ln5.ne4pg2_oQU480.F2010.crusher_crayclang.eam-thetahy_sl_pg2 SMS_Ln5.ne4pg2_oQU480.F2010.crusher_crayclang.eam-thetahy_sl_pg2_ftype0 SMS_Ln9.ne4_oQU240.F2010.crusher_crayclang.eam-outfrq9s
This error is occurred at the following error check routine in “E3SM/components/eam/src/physics/cam/phys_grid.F90”:
if ((btofc_blk_offset(blockid)%ncols > fdim) .or. & (btofc_blk_offset(blockid)%nlvls > ldim)) then write(iulog,*) "BLOCK_TO_CHUNK_SEND_PTERS: pter array dimensions ", & "not large enough: (",fdim,",",ldim,") not >= (", & btofc_blk_offset(blockid)%ncols,",", & btofc_blk_offset(blockid)%nlvls,")" call endrun() endif
When error occurs, "btofc_blk_offset(blockid)%ncols > fdim" fails.
@grnydawn , as I mentioned in the call, I'm reasonably certain this is a CCE 14.0.0 compiler bug that I've encountered before. If so, the actual location impacted by the bug is here:
https://github.com/E3SM-Project/E3SM/blob/fa1d32eefbd80ef8fb78d15d793f2ae5f3403ea0/components/eam/src/physics/cam/phys_grid.F90#L1099-L1105
And a workaround is to turn off inline for the specific impacted call using a compiler directive:
if (.not. associated(btofc_blk_offset(blockids(jb))%pter)) then
blksiz = get_block_gcol_cnt_d(blockids(jb))
!DIR$ NOINLINE
numlvl = get_block_lvl_cnt_d(blockids(jb),bcids(jb))
!DIR$ INLINE
btofc_blk_offset(blockids(jb))%ncols = blksiz
btofc_blk_offset(blockids(jb))%nlvls = numlvl
allocate( btofc_blk_offset(blockids(jb))%pter(blksiz,numlvl) )
endif
Could you see if that helps?
@abbotts , the workaround works as you described. I could build & run two of the test cases having the issue without error. Thanks!
Great! I've verified the fix in a development compiler. We should see it in a release compiler very soon.
Status: Known workaround, waiting for an upcoming release of the Cray compiler with the fix.
This should be fixed in CCE 14.0.2.
@abbotts @sarats CCE 14.0.2 seems to fix this issue. I tested one of the failed test cases, and the case was successfully built and run. I will test more cases about this issue. Thanks!
CCE 14.0.2 fixed this issue