WRF icon indicating copy to clipboard operation
WRF copied to clipboard

Optimized gcc 12 build of wrf.exe crashes at domain initialization

Open sfalmo opened this issue 2 years ago • 0 comments

I have built WRF v4.4.1 with gfortran/gcc 12 in sm mode (option 33) with basic nesting (option 1). Before compiling, I edited configure.wrf to enable link time optimization by adding -flto=auto to FCOPTIM and CFLAGS_LOCAL because this lead to a performance gain in earlier builds. The compilation succeeds, but the resulting wrf.exe crashes during the initialization of domain 2 with the following stack trace:

free(): invalid pointer

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x7f2c5dd91a12 in ???
#1  0x7f2c5dd90ba5 in ???
#2  0x7f2c5d91ba6f in ???
#3  0x7f2c5d96bc4c in ???
#4  0x7f2c5d91b9c5 in ???
#5  0x7f2c5d9057f3 in ???
#6  0x7f2c5d95fd9d in ???
#7  0x7f2c5d97595b in ???
#8  0x7f2c5d97779b in ???
#9  0x7f2c5d97a102 in ???
#10  0xcd86f4 in rsl_free
	at ../external/RSL_LITE/rsl_malloc.c:264
#11  0x139b7de in destroy_list
	at ../external/RSL_LITE/rsl_bcast.c:742
#12  0x139b7de in rsl_lite_allgather_msgs
	at ../external/RSL_LITE/rsl_bcast.c:625
#13  0x139d79f in rsl_lite_bcast_msgs_.constprop.0
	at ../external/RSL_LITE/rsl_bcast.c:524
#14  0x4378df in interp_domain_em_part1_
	at ../frame/module_dm.f90:14834
#15  0x737c9c in med_interp_domain_
	at ../share/mediation_interp_domain.f90:247
#16  0x602251 in med_nest_initial_
	at ../share/mediation_integrate.f90:352
#17  0x1484ecd in ???
#18  0x4373d1 in __module_wrf_top_MOD_wrf_run
	at ../main/module_wrf_top.f90:327
#19  0x4052ec in wrf
	at /root/rasp/build/WRF/main/wrf.f90:30
#20  0x4052ec in main
	at /root/rasp/build/WRF/main/wrf.f90:6
Aborted (core dumped)

To Reproduce

  1. Compiler version: gcc 12.2.1, GNU Fortran (GCC) 12.2.1
  2. Run ./configure and choose option 33 and option 1
  3. Edit configure.wrf: add -flto=auto to FCOPTIM and CFLAGS_LOCAL and ensure that the optimization level is set to -O2 at least. For debugging purposes, add -g3.
  4. Run ./compile -j 8 em_real
  5. wrf.exe builds successfully but will crash in runs with at least one nested domain.

Expected behavior WRF builds and runs successfully as with earlier versions of gfortran/gcc or when disabling link time optimization. Note that the issue also disappears if the optimization level is lowered, e.g. to -O1.

Attachments namelist.input and log of the WRF run up to the crash.

Additional context I have not tested it, but I assume that this crash will also occur in dm/dm+sm mode and with other nesting options. This issue is probably caused by undefined behavior in the RSL_LITE code and more aggressive optimizations since GCC 12. I have prepared a patch that fixes this issue in my test case and will open a pull request. namelist.input.txt wrf.log

sfalmo avatar Aug 31 '22 08:08 sfalmo