openfast icon indicating copy to clipboard operation
openfast copied to clipboard

OpenFAST dev branch segfaults Intel OneAPI 2023.2

Open jrood-nrel opened this issue 10 months ago • 23 comments

modules/aerodyn/src/AeroDyn_Inflow.f90: error #5623: **Internal compiler error: internal abort** Please report this error along with the circumstances in which it occurred in a Software Problem Report.  Note: File and line given may not be explicit cause of this error.

Can someone try building OpenFAST where AeroDyn_Inflow.f90 is compiled? I believe it should segfault Intel OneAPI 2023.2 using the LLVM compilers.

jrood-nrel avatar Apr 01 '24 23:04 jrood-nrel

That's an odd error. We'll definitely have to fix that.

FYI: AeroDyn_Inflow.f90 is only used with the AeroDyn_Driver and AeroDyn_Inflow_C_Bindings interface, not with OpenFAST. But it is included in the aerodynlib and gets built with AeroDyn as a result -- not sure why we did it that way.

andrew-platt avatar Apr 01 '24 23:04 andrew-platt

I don't know for certain if PR #2136 will solve this issue. @jrood-nrel, could you check if that branch fixes it?

andrew-platt avatar Apr 02 '24 00:04 andrew-platt

@andrew-platt I made the mistake of including AeroDyn_Inflow.f90 in aerodynlib as part of refactoring the CMake files for v3.5.0. At that point I didn't understand the separate use of this library. Thanks for fixing it.

deslaughter avatar Apr 02 '24 14:04 deslaughter

Sure I will try it.

jrood-nrel avatar Apr 02 '24 14:04 jrood-nrel

@jrood-nrel, I merged this into dev, so you can grab that branch instead if it is easier.

andrew-platt avatar Apr 02 '24 15:04 andrew-platt

FYI, Intel OneAPI 2024.1 also gives an internal compiler error on ModVar.f90 in the dev-unstable-pointers branch. I have traced it to two lines that are supposed to perform automatic deallocation/allocation of a type. line 568: image and line 651: image

bjonkman avatar Apr 02 '24 16:04 bjonkman

@bjonkman, Thanks for the info! @deslaughter, do we have other instances of the automatic deallocation/allocation that may not be supported by all compilers yet?

andrew-platt avatar Apr 02 '24 16:04 andrew-platt

I still see this with the dev branch. I am using the CPP bindings btw, so does that mean I always compile this file?

modules/aerodyn/src/AeroDyn_Inflow.f90: error #5623: **Internal compiler error: internal abort** Please report this error along with the circumstances in which it occurred in a Software Problem Report.  Note: File and line given may not be explicit cause of this error.

jrood-nrel avatar Apr 02 '24 16:04 jrood-nrel

There must still be an issue in AeroDyn_Inflow.f90 itself, and not just an issue with the cmake libs.

Are you compiling aerodyn_inflow_c_bindings as well as the the OpenFAST CPP interface?

andrew-platt avatar Apr 02 '24 16:04 andrew-platt

Paths redacted:

-DBUILD_DOCUMENTATION:BOOL=OFF -DBUILD_TESTING:BOOL=OFF -DBUILD_SHARED_LIBS:BOOL=ON -DDOUBLE_PRECISION:BOOL=ON -DUSE_DLL_INTERFACE:BOOL=ON -DBUILD_OPENFAST_CPP_API:BOOL=ON -DCMAKE_POSITION_INDEPENDENT_CODE:BOOL=ON -DBLAS_LIBRARIES:STRING=/path/lib/libopenblas.so -DLAPACK_LIBRARIES:STRING=/path/lib/libopenblas.so -DCMAKE_CXX_COMPILER:STRING=/path/mpicxx -DCMAKE_C_COMPILER:STRING=/path/mpicc -DCMAKE_Fortran_COMPILER:STRING=/path/mpif90 -DMPI_CXX_COMPILER:STRING=/path/mpicxx -DMPI_C_COMPILER:STRING=/path/mpicc -DMPI_Fortran_COMPILER:STRING=/path/mpif90 -DHDF5_ROOT:STRING=/path -DYAML_ROOT:STRING=/path -DHDF5_NO_FIND_PACKAGE_CONFIG_FILE:BOOL=ON -DNETCDF_ROOT:STRING=/path

jrood-nrel avatar Apr 02 '24 16:04 jrood-nrel

What is your make command?

andrew-platt avatar Apr 02 '24 16:04 andrew-platt

make

jrood-nrel avatar Apr 02 '24 16:04 jrood-nrel

Ah. That ends up building all targets including aerodyn_driver and aerodyn_inflow_c_bindings. I'm guessing you don't need all the module drivers, module wrappers, or TurbSim.

As a temporary workaround, could you specify only the targets of interest make openfast openfastcpp?

andrew-platt avatar Apr 02 '24 16:04 andrew-platt

From @deslaughter

Intel's LLVM based fortran compiler is really new. We haven't said we're going to support it yet, AFAIK

We will work towards fully supporting Intel's LLVM in the future, but I don't know how soon that will be. So if you can work around the issue by specifying the targets, that would be preferable while we find time/resources to fully test with Intel's LLVM.

andrew-platt avatar Apr 02 '24 16:04 andrew-platt

FYI, Intel OneAPI 2024.1 also gives an internal compiler error on ModVar.f90 in the dev-unstable-pointers branch. I have traced it to two lines that are supposed to perform automatic deallocation/allocation of a type.

@bjonkman Thanks for tracing this down. I'm really surprised this is causing an issue since it's part of the Fortran 2003 standard. I'll take a look and see if I can figure out what's happening.

Do we have other instances of the automatic deallocation/allocation that may not be supported by all compilers yet?

I think that I was mostly using it in the new tight coupling code, though I remember seeing an instance in AeroDyn that caused problems with Flang, though I don't think it was related to ADI.

deslaughter avatar Apr 02 '24 17:04 deslaughter

So using make openfast openfastcpp gets past building, but I will need make install to not build all the targets because it just fails with the segfault during make install. How can I do that?

jrood-nrel avatar Apr 02 '24 17:04 jrood-nrel

FYI, Intel OneAPI 2024.1 also gives an internal compiler error on ModVar.f90 in the dev-unstable-pointers branch. I have traced it to two lines that are supposed to perform automatic deallocation/allocation of a type.

@bjonkman Thanks for tracing this down. I'm really surprised this is causing an issue since it's part of the Fortran 2003 standard. I'll take a look and see if I can figure out what's happening.

When I replace it with a more traditional Fortran (and longer) allocation method, it works. image

Internal compiler errors are bugs in the compiler, which may or may not be caused by valid code. I actually misspoke on what I was using to build. The 32-bit compiler is deprecated, so it's using a slightly different version. The error is with Intel Fortran Compiler Classic 2021.12.0 [IA-32] (IFORT). The Intel Fortran Compiler 2024.1.0 [Intel 64] (IFX) builds the original code fine.

bjonkman avatar Apr 02 '24 17:04 bjonkman

@bjonkman Thanks for clarifying. I'm glad that the traditional method still works.

@jrood-nrel @andrew-platt I've tracked the ADI library compiler bug to being caused by OpenMP. Without OpenMP enabled, the compile succeeds. OpenMP is being enabled by use of the C++ API. ADI_CalcOutput_IW uses $OMP, maybe there's something wrong with those comments. I'll dig a little more.

deslaughter avatar Apr 02 '24 18:04 deslaughter

Ok I will try that. We already disable OpenMP in this line when building with MacOS on our laptops to avoid build errors https://github.com/OpenFAST/openfast/blob/5d17a91ca6672cddfd1d260bc8829846c256695f/CMakeLists.txt#L113

jrood-nrel avatar Apr 02 '24 18:04 jrood-nrel

@andrew-platt I think we should remove the OMP comments from the section of code that's causing the issue. https://github.com/OpenFAST/openfast/blob/5d17a91ca6672cddfd1d260bc8829846c256695f/modules/aerodyn/src/AeroDyn_Inflow.f90#L491 Unless there are a huge number of points, I don't expect splitting the loop over multiple threads to significantly increase performance.

deslaughter avatar Apr 02 '24 18:04 deslaughter

Disabling OpenMP solves the segfault for us. We will just always disable OpenMP. Thanks for the help.

jrood-nrel avatar Apr 02 '24 20:04 jrood-nrel

I agree on removing OMP from AeroDyn_Inflow.f90. This may incur a performance penalty in the FVW module when we have ~100k points, but we intend to change how the data is accessed there with the introduction of the FlowField data structure in dev-unstable-pointers.

See #2140

andrew-platt avatar Apr 02 '24 20:04 andrew-platt

In theory, #2140 should fix the segfault of the OneAPI compiler

andrew-platt avatar Apr 02 '24 23:04 andrew-platt