Fortran runtime error: Index '1' of dimension 2 of array 'this' outside of expected range SMS_D.f19_g16.I1850ELM.machine_compiler.elm-betr with invalid
As we closed https://github.com/E3SM-Project/E3SM/issues/5539, I'm making another issue here with same error. We are trying to add the invalid check to the fortran compiler.
With SMS_D.f19_g16.I1850ELM.pm-cpu_gnu.elm-betr:
3: At line 124 of file /global/cfs/cdirs/e3sm/ndk/repos/ndk_mf_gnu-add-invalid-to-DEBUG/components/elm/src/external_models/sbetr/src/betr/betr_core/TracerStateType.F90
3: Fortran runtime error: Index '1' of dimension 2 of array 'this' outside of expected range (140737046949536:40202912)
/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/mfgnuinvalid/SMS_D.f19_g16.I1850ELM.pm-cpu_gnu.elm-betr.gh5539
To add invalid check:
login04% git diff cime_config/machines/cmake_macros/gnu.cmake
diff --git a/cime_config/machines/cmake_macros/gnu.cmake b/cime_config/machines/cmake_macros/gnu.cmake
index eae59e3e4b..a8fce54cbf 100644
--- a/cime_config/machines/cmake_macros/gnu.cmake
+++ b/cime_config/machines/cmake_macros/gnu.cmake
@@ -19,7 +19,8 @@ endif()
if (DEBUG)
string(APPEND CFLAGS " -g -Wall -fbacktrace -fcheck=bounds -ffpe-trap=invalid,zero,overflow")
string(APPEND CXXFLAGS " -g -Wall -fbacktrace")
- string(APPEND FFLAGS " -g -Wall -fbacktrace -fcheck=bounds -ffpe-trap=zero,overflow")
+ string(APPEND FFLAGS " -g -Wall -fbacktrace -fcheck=bounds,pointer -ffpe-trap=invalid,zero,overflow")
@ndkeen I fixed the issue with branch jinyuntang/fix5832, could you do a test? The problem is an array size inconsistency between elm and sbetr. A small update of sbetr fixed the problem as far as I can tell from my test.
When I add invalid flag to recent master and try the test, I now see a different error mesg that reported above.
95: Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
95:
95: Backtrace for this error:
95: #0 0x14f0c72dedbf in ???
95: #1 0x1f604a7 in __tracerparamsmod_MOD_calc_aerecond
95: at /global/cfs/cdirs/e3sm/ndk/repos/me24-aug15/components/elm/src/external_models/sbetr/src/betr/betr_para/TracerParamsMod.F90:1271
95: #2 0x1f4c777 in __betrbgcmod_MOD_stage_tracer_transport
95: at /global/cfs/cdirs/e3sm/ndk/repos/me24-aug15/components/elm/src/external_models/sbetr/src/betr/betr_main/BetrBGCMod.F90:203
95: #3 0x1e420e8 in __betrtype_MOD_step_without_drainage
95: at /global/cfs/cdirs/e3sm/ndk/repos/me24-aug15/components/elm/src/external_models/sbetr/src/driver/shared/BeTRType.F90:375
95: #4 0x1b25651 in __betrsimulationelm_MOD_elmstepwithoutdrainage
95: at /global/cfs/cdirs/e3sm/ndk/repos/me24-aug15/components/elm/src/external_models/sbetr/src/driver/elm/BeTRSimulationELM.F90:314
95: #5 0x6862a8 in __elm_driver_MOD_elm_drv
95: at /global/cfs/cdirs/e3sm/ndk/repos/me24-aug15/components/elm/src/main/elm_driver.F90:1178
95: #6 0x6509c7 in __lnd_comp_mct_MOD_lnd_run_mct
95: at /global/cfs/cdirs/e3sm/ndk/repos/me24-aug15/components/elm/src/cpl/lnd_comp_mct.F90:514
If I check out your branch, add invalid, I do not see a crash. However, I'm not sure what changes you made based on the branch.
@ndkeen the problem is due to a more recent update of maxpft from a small number to a larger number 50, causing a mistmatch between sbetr and elm. If you find my fix solve the problem, I will update sbetr, and update e3sm and create a pull request based on this.
Note above, I show how to add invalid check, so you can try yourself. Then go ahead and make PR.
When I tested the branch "jinyuntang/fix5832", I included invalid check. But I did not include that change in the push to branch jinyuntang/fix5832. For creating the PR, do I have to also include the invalid check made to " /cime_config/machines/cmake_macros/gnu.cmake"?
On Fri, Aug 18, 2023 at 5:21 PM noel @.***> wrote:
Note above, I show how to add invalid check, so you can try yourself. Then go ahead and make PR.
— Reply to this email directly, view it on GitHub https://github.com/E3SM-Project/E3SM/issues/5832#issuecomment-1684574351, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTQV3Q5JE3BD46725TFHYDXWABKFANCNFSM6AAAAAA2V55MB4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Jinyun (He/him) Staff Scientist Lawrence Berkeley National Laboratory 1 Cyclotron Rd., MS 74R316C Berkeley, CA 94720 tel: 510 486-5792, fax: 510 486-7070
Great! Then sounds like you have fixed this issue. You would not want to include that change in your PR -- we would like to add it, but are still trying to fix issues that were uncovered with it (like this one).
Great! I will create a PR then.
On Fri, Aug 18, 2023 at 6:29 PM noel @.***> wrote:
Great! Then sounds like you have fixed this issue. You would not want to include that change in your PR -- we would like to add it, but are still trying to fix issues that were uncovered with it (like this one).
— Reply to this email directly, view it on GitHub https://github.com/E3SM-Project/E3SM/issues/5832#issuecomment-1684643464, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTQV3QXN3F7Q3WSCYPYUXLXWAJINANCNFSM6AAAAAA2V55MB4 . You are receiving this because you commented.Message ID: @.***>
-- Jinyun (He/him) Staff Scientist Lawrence Berkeley National Laboratory 1 Cyclotron Rd., MS 74R316C Berkeley, CA 94720 tel: 510 486-5792, fax: 510 486-7070
With Oct27th checkout, I still see this error
I'm still seeing the same error with Jan18th master and Jan23rd master
@ndkeen Is there any change I'd made to do test? I recall last time you instructed me to made some changes in time. Now, after I trying ./create_test SMS_D.f19_g16.I1850ELM.pm-cpu_gnu.elm-betr I got the following error "FAIL SMS_D.f19_g16.I1850ELM.pm-cpu_gnu.elm-betr (phase CREATE_NEWCASE)". I have no clue what is going on. Thanks.
Yes that is correct command. I don't have enough info there to know what's wrong, but if I were to guess: Are you trying that on perlmutter? If on another machine, need the machine name instead of pm-cpu. Are you trying from cime/scripts? I guess so as it would otherwise say create_test not found.
When I try this test on master:
create_test SMS_D.f19_g16.I1850ELM.pm-cpu_gnu.elm-betr
I still see the same error as noted above
Note that the change I mention above (regarding compiler flags) should no longer be needed as master has this change (for quite a while).
@ndkeen It appeared I have to update the submodules. After that, now it is working. I will report back the result once it is done.
Ah, yep, that's another common mistake I should have mentioned
@ndkeen, just let you know that the tests passed.