WRF
WRF copied to clipboard
Optimized gcc 12 build of wrf.exe crashes at domain initialization
I have built WRF v4.4.1 with gfortran/gcc 12 in sm mode (option 33) with basic nesting (option 1).
Before compiling, I edited configure.wrf
to enable link time optimization by adding -flto=auto
to FCOPTIM
and CFLAGS_LOCAL
because this lead to a performance gain in earlier builds.
The compilation succeeds, but the resulting wrf.exe
crashes during the initialization of domain 2 with the following stack trace:
free(): invalid pointer
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x7f2c5dd91a12 in ???
#1 0x7f2c5dd90ba5 in ???
#2 0x7f2c5d91ba6f in ???
#3 0x7f2c5d96bc4c in ???
#4 0x7f2c5d91b9c5 in ???
#5 0x7f2c5d9057f3 in ???
#6 0x7f2c5d95fd9d in ???
#7 0x7f2c5d97595b in ???
#8 0x7f2c5d97779b in ???
#9 0x7f2c5d97a102 in ???
#10 0xcd86f4 in rsl_free
at ../external/RSL_LITE/rsl_malloc.c:264
#11 0x139b7de in destroy_list
at ../external/RSL_LITE/rsl_bcast.c:742
#12 0x139b7de in rsl_lite_allgather_msgs
at ../external/RSL_LITE/rsl_bcast.c:625
#13 0x139d79f in rsl_lite_bcast_msgs_.constprop.0
at ../external/RSL_LITE/rsl_bcast.c:524
#14 0x4378df in interp_domain_em_part1_
at ../frame/module_dm.f90:14834
#15 0x737c9c in med_interp_domain_
at ../share/mediation_interp_domain.f90:247
#16 0x602251 in med_nest_initial_
at ../share/mediation_integrate.f90:352
#17 0x1484ecd in ???
#18 0x4373d1 in __module_wrf_top_MOD_wrf_run
at ../main/module_wrf_top.f90:327
#19 0x4052ec in wrf
at /root/rasp/build/WRF/main/wrf.f90:30
#20 0x4052ec in main
at /root/rasp/build/WRF/main/wrf.f90:6
Aborted (core dumped)
To Reproduce
- Compiler version: gcc 12.2.1, GNU Fortran (GCC) 12.2.1
- Run
./configure
and choose option 33 and option 1 - Edit
configure.wrf
: add-flto=auto
toFCOPTIM
andCFLAGS_LOCAL
and ensure that the optimization level is set to-O2
at least. For debugging purposes, add-g3
. - Run
./compile -j 8 em_real
-
wrf.exe
builds successfully but will crash in runs with at least one nested domain.
Expected behavior
WRF builds and runs successfully as with earlier versions of gfortran/gcc or when disabling link time optimization.
Note that the issue also disappears if the optimization level is lowered, e.g. to -O1
.
Attachments namelist.input and log of the WRF run up to the crash.
Additional context I have not tested it, but I assume that this crash will also occur in dm/dm+sm mode and with other nesting options. This issue is probably caused by undefined behavior in the RSL_LITE code and more aggressive optimizations since GCC 12. I have prepared a patch that fixes this issue in my test case and will open a pull request. namelist.input.txt wrf.log