nalu-wind icon indicating copy to clipboard operation
nalu-wind copied to clipboard

OpenFAST segfault in CPU builds

Open tasmith4 opened this issue 3 years ago • 10 comments
trafficstars

Some tests are failing in CPU builds with segfaults that appear to be from OpenFAST. Presumably these are all the same bug, although we can split up this issue if we find out that's wrong.

Unit tests:

  • ActuatorBulkDiskFastTest.NGP_sweptPointsPopulatedVaried
  • ActuatorBulkFastTests.NGP_initializeActuatorBulk (note this test also fails in a Cuda build)

Reg tests:

  • nrel5MWactuatorDisk
  • nrel5MWactuatorLine
  • nrel5MWactuatorLineAnisoGauss
  • nrel5MWactuatorLineFllc
  • nrel5MWadvActLine

tasmith4 avatar Jun 02 '22 17:06 tasmith4

@psakievich were you looking at this?

tasmith4 avatar Jun 02 '22 17:06 tasmith4

This appears to just be an issue with [email protected]. @rafmudaf have you had a chance to look at this yet?

psakievich avatar Jun 02 '22 17:06 psakievich

@psakievich No I haven't seen this. Can you point me to the test logs?

rafmudaf avatar Jun 03 '22 17:06 rafmudaf

@rafmudaf see here: https://my.cdash.org/viewTest.php?onlyfailed&buildid=2173766

This is the relevant section from one of the failing tests -- unitTest1 (all are similar):

[----------] 3 tests from ActuatorFunctorFastTests
[ RUN      ] ActuatorFunctorFastTests.NGP_runAssignVelAndComputeForces

 **************************************************************************************************
 OpenFAST

 Copyright (C) 2022 National Renewable Energy Laboratory
 Copyright (C) 2022 Envision Energy USA LTD

 This program is licensed under Apache License Version 2.0 and comes with ABSOLUTELY NO WARRANTY.
 See the "LICENSE" file distributed with this software for details.
 **************************************************************************************************

 OpenFAST--128-NOTFOUND
 Compile Info:
  - Compiler: GCC version 9.3.0
  - Architecture: 64 bit
  - Precision: double
  - OpenMP: No
  - Date: Jun  1 2022
  - Time: 14:38:44
 Execution Info:
  - Date: 06/03/2022
  - Time: 00:58:17-0600

 OpenFAST input file heading:
     FAST Certification Test #01: NREL 5.0 MW Baseline Wind Turbine (Onshore)

 Running ElastoDyn.
 Nodal outputs section of ElastoDyn input file not found or improperly formatted.
 Running AeroDyn.
 Warning: Turning off Unsteady Aerodynamics because UA parameters are not included in airfoil
 (airfoil has likely has constant polars). (node 1, blade 1)
 Warning: Turning off Unsteady Aerodynamics because UA parameters are not included in airfoil
 (airfoil has likely has constant polars). (node 1, blade 2)
 Warning: Turning off Unsteady Aerodynamics because UA parameters are not included in airfoil
 (airfoil has likely has constant polars). (node 1, blade 3)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 12201 RUNNING AT rhodes.hpc.nrel.gov
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

tasmith4 avatar Jun 03 '22 17:06 tasmith4

@rafmudaf do you have further thoughts on this?

tasmith4 avatar Jun 14 '22 19:06 tasmith4

I built openfast on my laptop using apple-clang and gcc@11 and did not see the seg faults. So it is likely not the input deck as I suspected. I will try to update them anyways though.

psakievich avatar Aug 12 '22 19:08 psakievich

see https://github.com/OpenFAST/openfast/pull/1227

psakievich avatar Aug 24 '22 01:08 psakievich

After that OpenFAST PR merged as well as #1023, we seem to be down to segfaults only on clang on the SNL dashboard. @psakievich these segfaults don't show up on the NREL dashboard. The main differences between the two are:

  • SNL build is release, NREL build is debug
  • SNL uses Clang 12.0.1, NREL uses Clang 10.0.0
  • SNL uses trilinos@develop, NREL uses trilinos@stable

tasmith4 avatar Aug 31 '22 13:08 tasmith4

Okay I will try to get back to this soon

psakievich avatar Aug 31 '22 17:08 psakievich

@tasmith4 I'm unable to reproduce this locally on ascicgpu22. I'm wondering if the [email protected] build is segfaulting because that openfast build didn't get updated after we added the patch?

I ran the unittests and some of the regression tests with a debug build and have not hit anything. It seems rather suspicious to me that it is failing so consistently and across every openfast test, but no other compiler on either of the dashboards sees the segfault.

psakievich avatar Sep 09 '22 18:09 psakievich