WRF icon indicating copy to clipboard operation
WRF copied to clipboard

WRF4DVAR fails with intel LLVM compilers

Open HathewayWill opened this issue 1 year ago • 31 comments

Describe the bug Intel LLVM compilers for WRF da fails to build all the required exe files.

libufr fails to build.

To Reproduce fails.zip

using option 40 for intel llvm dmpar

Expected behavior expected 43 exe in /varr/da expectted 1 exe in var/obsproc/src

got 42 exe in /var/da got 0 iexe in /var/obsproc/src

Screenshots If applicable, add screenshots to help explain your problem.

Attachments works.zip

fix: add the following flags for llvm compilers to CFLAGS

CFLAGS_LOCAL = -w -O3 -ip -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types#-xHost -fp-model fast=2 -no-prec-div -no-prec-sqrt -ftz -no-multibyte-chars # -DRSL0_ONLY

Additional context Add any other context about the problem here, such as:

  • This used to work with a previous version.
  • The documentation is different from the exhibited behavior.

HathewayWill avatar Dec 26 '23 14:12 HathewayWill

@HathewayWill We are aware of the compilation issue for DA and Chem code using the newest Intel compiler.

weiwangncar avatar Dec 27 '23 01:12 weiwangncar

Wasn't sure. Didn't see it in the GitHub discussion.

@weiwangncar

HathewayWill avatar Dec 27 '23 01:12 HathewayWill

@HathewayWill I have added a note in the release note.

weiwangncar avatar Dec 27 '23 01:12 weiwangncar

I'll try to find a solution. I have a lot of free time.

HathewayWill avatar Dec 27 '23 01:12 HathewayWill

@weiwangncar

Good morning,

I was sucessfully able to get Intel LLVM to install WRFPLUS/WRF4DVAR by adding the following commands:

        sed -i '144s|-ip|-ip -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types |g' $WRF_FOLDER/WRFPLUS/configure.wrf
        sed -i '145s|-ip|-ip -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types |g' $WRF_FOLDER/WRFPLUS/configure.wrf

        sed -i '144s|-ip|-ip -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types |g' $WRF_FOLDER/WRFDA/configure.wrf
        sed -i '145s|-ip|-ip -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types |g' $WRF_FOLDER/WRFDA/configure.wrf

One thing I did notice for WRFPLUS and 4DVAR there is a big memroy leak somewhere during installation.

I have 64GB of RAM and 64GB of SWAP RAM and it was maxing out my physical RAM and then half of my SWAP. I didn't get to see which module was causing it but I think I saw @islas mention there was a process that needed -j 1 somewhere in the documentation for the release notes.

Hope this helps you.

HathewayWill avatar Dec 27 '23 12:12 HathewayWill

Here's the output during the memory leak. Memory Leak.log

Also affects WRF chem

HathewayWill avatar Dec 27 '23 16:12 HathewayWill

@HathewayWill Can you make a PR with your fixes for WRFDA/WRFPlus compilation with Intel-OneAPI compiler?

liujake avatar Jan 02 '24 15:01 liujake

@liujake @weiwangncar

I don't know how to do a PR so I was letting NCAR staff look at my comments and files and let them do it.

HathewayWill avatar Jan 02 '24 15:01 HathewayWill

@liujake the memory leak is another problem I don't know how to fix but it's documented in the zip file.

HathewayWill avatar Jan 02 '24 15:01 HathewayWill

@HathewayWill I'm not sure I follow how adding these flags: -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types solves the compilation issues as all they are doing is suppressing warnings, unless WRFDA or the new Intel icx standard used treats those as errors. If that is the case the fix is to actually fix that code rather than suppress them.

It sounds like these flag additions were independent of the memory leak issue, is that correct?

islas avatar Jan 06 '24 20:01 islas

@islas I'm current sick with COVID and in isolation for 2 weeks. Let me get back to you when I feel better.

HathewayWill avatar Jan 06 '24 20:01 HathewayWill

@HathewayWill I'm not sure I follow how adding these flags: -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types solves the compilation issues as all they are doing is suppressing warnings, unless WRFDA or the new Intel icx standard used treats those as errors. If that is the case the fix is to actually fix that code rather than suppress them.

It sounds like these flag additions were independent of the memory leak issue, is that correct?

@islas

Yes the memeory leak is different. I will rerun the installation without the flags added to show the errors that popped up

HathewayWill avatar Jan 07 '24 18:01 HathewayWill

@islas

So here are two log files from WRF v4.5.2 that doesn't compile correctly when those flags are not included.
compile.wrf1.log compile.wrf2.log

HathewayWill avatar Jan 07 '24 19:01 HathewayWill

Thanks @HathewayWill The first log seems to fail because module_gfs_machine is being compiled in parallel with module_bl_mynn_common, not before:

2165  rm -f module_gfs_machine.G module_gfs_machine.bb
 2166  rm -f module_bl_qnsepbl.G module_bl_qnsepbl.bb
 2167  time mpiifx -o module_cam_error_function.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_cam_error_function.f90
 2168  rm -f module_bl_acm.G module_bl_acm.bb
 2169  rm -f module_bl_mrf.G module_bl_mrf.bb
 2170  time mpiifx -o complex_number_module.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  complex_number_module.f90
 2171  time mpiifx -o module_bl_ysu.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_ysu.f90
 2172  rm -f module_bl_fogdes.G module_bl_fogdes.bb
 2173  rm -f module_bl_mynn_common.G module_bl_mynn_common.bb
 2174  rm -f module_bl_myjurb.G module_bl_myjurb.bb
 2175  time mpiifx -o module_cam_shr_kind_mod.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_cam_shr_kind_mod.f90
 2176  time mpiifx -o module_bl_shinhong.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_shinhong.f90
 2177  time mpiifx -o module_gfs_machine.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_gfs_machine.f90
 2178  time mpiifx -o module_bl_qnsepbl.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_qnsepbl.f90
 2179  time mpiifx -o module_bl_acm.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_acm.f90
 2180  time mpiifx -o module_bl_mrf.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_mrf.f90
 2181  rm -f module_bl_gwdo_gsl.G module_bl_gwdo_gsl.bb
 2182  rm -f module_bl_myjpbl.G module_bl_myjpbl.bb
 2183  time mpiifx -o module_bl_mynn_common.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_mynn_common.f90
 2184  rm -f module_bl_boulac.G module_bl_boulac.bb
 2185  rm -f module_bl_gwdo.G module_bl_gwdo.bb
 2186  time mpiifx -o module_bl_fogdes.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_fogdes.f90
 2187  time mpiifx -o module_bl_myjurb.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_myjurb.f90
 2188  time mpiifx -o module_bl_gwdo_gsl.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_gwdo_gsl.f90
 2189  time mpiifx -o module_bl_myjpbl.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_myjpbl.f90
 2190  time mpiifx -o module_bl_boulac.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_boulac.f90
 2191  time mpiifx -o module_bl_gwdo.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_gwdo.f90
 2192  module_bl_mynn_common.f90(21): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [MODULE_GFS_MACHINE]
 2193    use module_gfs_machine,  only : kind_phys
 2194  ------^
 2195  module_bl_mynn_common.f90(21): error #6580: Name in only-list does not exist or is not accessible.   [KIND_PHYS]
 2196    use module_gfs_machine,  only : kind_phys
 2197  ----------------------------------^
 2198  compilation aborted for module_bl_mynn_common.f90 (code 1)

It may be a little difficult to tell, but a sure way to identify it is to look at when the real compile command happens vs when the rm command happens with respect to other files (rm is the first command in the WRF makerule for these files):

  • rm for module_gfs_machine starts at line 2165
  • rm for module_bl_mynn_common starts at line 2173
    • However compile command for module_gfs_machine happens at line 2177, meaning compilation for module_bl_mynn_common has started before module_gfs_machine has even finished compiling

PR #1950 aims to fix these issues - if you search the depende.common we can see that the dependency of module_bl_mynn_common on module_gfs_machine is clearly missing, but exists in the edits for that PR

Log 2 is a little more confusing, I'm not too sure about what the issue is there and I can't definitively rule out an environment issue - however the flags in question should not affect whether MPI or not could be found.

In the end, I am confident log 1 is not truly remedied by the flags and is most likely just adjusting the compilation race condition that exists (i.e. getting lucky) and log 2 shouldn't be affected by those flags.

islas avatar Jan 08 '24 23:01 islas

@islas

okay that's good information, so those flags that were turning errors into warnings was just a lucky guess then?

HathewayWill avatar Jan 08 '24 23:01 HathewayWill

@islas

https://github.com/wrf-model/WRF/issues/1967

could that also be part of the problem with WRF CHEM too?

HathewayWill avatar Jan 08 '24 23:01 HathewayWill

Yes, though I haven't taken a look at the logs you posted in that issue the problem described matches the erroneous behavior pretty well. I suspect it may help, though #1950 only affects the WRF core objects, so if dependencies under the chem or da are missing those will still be issues.

islas avatar Jan 09 '24 00:01 islas

Yes, though I haven't taken a look at the logs you posted in that issue the problem described matches the erroneous behavior pretty well. I suspect it may help, though #1950 only affects the WRF core objects, so if dependencies under the chem or da are missing those will still be issues.

@islas so that will involve more detailed dives into the logs. Let me know which tests I can do to help because I think those log files for DA and Chem used the flags to make it work.

I can always rerun it without them

HathewayWill avatar Jan 09 '24 00:01 HathewayWill

Here are the log files @islas without any flags added for WRFDA 4DVAR

Failure_WRFDA.zip

HathewayWill avatar Jan 10 '24 21:01 HathewayWill

@islas

Any updates on these issues plaguing llvm? #1992 #1981 #1967 #1957

I think they are all related

HathewayWill avatar Mar 18 '24 09:03 HathewayWill

Are you seeing these issues on either the latest updates from develop (9e265af51ddb41cd1993d55289e13a5bcb3ae0c4) or the current release candidate (release-v4.6.0)?

These now include build dependency fixes and syntax/flag updates for the new Intel oneAPI compilers for WRF, WRFDA, and WRF-Chem

islas avatar Mar 18 '24 22:03 islas

@islas

Is there a .tar file for these? I'm not really familiar with how to pull with github.

HathewayWill avatar Mar 19 '24 00:03 HathewayWill

@HathewayWill Do this: git clone https://github.com/wrf-model/WRF.git cd WRF/ git checkout release-v4.6.0

weiwangncar avatar Mar 19 '24 16:03 weiwangncar

@HathewayWill Do this: git clone https://github.com/wrf-model/WRF.git cd WRF/ git checkout release-v4.6.0

Thank you @weiwangncar I'll try it today

HathewayWill avatar Mar 19 '24 23:03 HathewayWill

Good morning @weiwangncar @islas @kkeene44 @mgduda

Here are the log files for each issue and their update.

Tested on Ubuntu 22.04.4, 64GB of physical RAM 64GB of SWAP RAM, release candidate 4.6.0

#1992 WRF_4.6.0_intel_LLVM.zip (PASS)

#1981 WRFCHEM_4.6.0_intel_LLVM_memory_leak.zip (FAILS, Memory Leak)

#1967 wrfchemda_4.6_intel_llvm.zip (PASS)

#1957 WRFPLUS_4.6.0_intel_LLVM_memory_leak.zip (FAIL, Memory Leak)

The Memory leaks in chem and wrfplus maxed out my 128GB worth of RAM and shut down the computer. Happens at the same exact spot on each compilation which is confusing to me.

HathewayWill avatar Mar 20 '24 10:03 HathewayWill

I've taken a closer look at the issue, and as far as I can tell this is not a memory leak (though it does take an exorbitant amount of memory) on the WRF-side of things. I can't say for certain whether it is an ifx memory "issue" per se, but the problem can be isolated to just the compilation of large files (>10k lines of code). I suspect the parsing and basic compiler optimizations are taking the most memory, as disabling all possible optimizations and outputting diagnostic info didn't yield anything of note.

Unfortunately, this is fundamental "feature" of the WRF autogenerated code from the registry and will require splitting some of the larger includes into separable files. I've already attempted this with Fortran submodules with limited success as not all WRF-supported compilers implement this well, thus a better approach for splitting the code up would need to be investigated.

For reference, attached are two outputs : one from the make build and another from the cmake build - both of which show massive spikes in compilation of module_domain.F, which has ~23K lines. Both methods are done with the Intel oneAPI compilers and -j 1, and top sorted by memory usage to show that the process consuming the memory is xfortcom (the llvm compiler under the hood). Screenshot from 2024-03-20 11-06-54 Screenshot from 2024-03-20 10-43-11

islas avatar Mar 22 '24 19:03 islas

@weiwangncar @islas

Anything I can do on my side to help this?

HathewayWill avatar Mar 22 '24 21:03 HathewayWill

@islas

You tried setting FCFLAGS and CFLAGS with no -03 correct?

HathewayWill avatar Mar 22 '24 22:03 HathewayWill

Correct, -O0 -no-ip and ensuring the MPI compiler wrapper does not sneak any flags in as well

islas avatar Mar 22 '24 22:03 islas

Correct, -O0 -no-ip and ensuring the MPI compiler wrapper does not sneak any flags in as well

I have also tested different versions of the MPI compiler commands

mpiifx mpiifort -fc=ifx mpif90 -fc=ifx

all of them do the same thing with and without optimizations @islas

HathewayWill avatar Mar 22 '24 22:03 HathewayWill