Albany
Albany copied to clipboard
Seg Fault when trying to use new MueLu preconditioners for Enthalpy problems
I get a segmentation fault when using the new MueLu settings provided by Ray Tuminaro for the Humboldt problem.
I tried different settings, below is the error for P1semR1transP2const:
2: ************************************************************************
2: -- Nonlinear Solver Step 0 --
2: ||F|| = 7.439e+02 step = 0.000e+00 dx = 0.000e+00
2: ************************************************************************
2:
2: Phalanx writing graphviz file for graph of FM0Jacobian (detail = 2)
2: Process using 'dot -Tpng -O phalanxGraphFM0Jacobian
2: ************* Phalanx Setup **************
2: ************ Evaluation Types ************
2: FM0Jacobian
2: DFM0Residual
2: FM0Residual
2:
2: ******************************************
2: Phalanx writing graphviz file for graph of DFM0Jacobian (detail = 2)
2: Process using 'dot -Tpng -O phalanxGraphDFM0Jacobian
2: ************* Phalanx Setup **************
2: ************ Evaluation Types ************
2: DFM0Jacobian
2: FM0Jacobian
2: DFM0Residual
2: FM0Residual
2:
2: ******************************************
2: --------------------------------------------------------------------------
2: Primary job terminated normally, but 1 process returned
2: a non-zero exit code. Per user-direction, the job has been aborted.
2: --------------------------------------------------------------------------
2: --------------------------------------------------------------------------
2: mpiexec noticed that process rank 3 with PID 0 on node s1026095 exited on signal 11 (Segmentation fault).
I didn't get much info running dbg.
To reproduce the error, build branch https://github.com/sandialabs/Albany/tree/enthalpy_muelu and run the Enthalpy tests:
ctest -R Enthalpy_Humboldt_MueLu
@mperego I have an Albany executable on Perlmutter, but that's probably not the easiest platform to debug on. Is there another machine you'd suggest I build on?
Thanks @jhux2. You could use blake. We have scripts for building Trilinos and Albany. I think you can use the gcc modules blake_gcc_modules_submit.sh and cmake scripts, do-cmake-trilinos-gcc-serial, do-cmake-albany-serial. -- I got the error with gcc compiler. @jewatkins do you have better advise?
blake is probably the best option right now. The gcc build is a debug build so it will run slow but it might give you more information. You can use the binary directly: /home/projects/albany/nightlyCDashAlbanyBlake/build-gcc/AlbBuildSerialGccNoWarn/src/Albany or use the trilinos install /home/projects/albany/nightlyCDashTrilinosBlake/build-gcc/TrilinosSerialInstallGccNoWarn/
I've run the Humboldt test that's on the main branch, just as a sanity check. This uses the executable that @jewatkins pointed to. Right after the stacked timer output, which I assume comes the end of the simulation, there are a few errors. Are these to be expected?
| Albany Fill: State Residual: 0.00712972 - 0.011611% [1]
| | Phalanx::SortAndOrderEvaluators: 8.958e-06 - 0.125643% [5]
| | Remainder: 0.00712076 - 99.8744%
| Albany: Output to File: 0.298793 - 0.486596% [1]
| Remainder: 0.178301 - 0.29037%
***
*** Warning! The following Teuchos::RCPNode objects were created but have
*** not been destroyed yet. A memory checking tool may complain that these
*** objects are not destroyed correctly.
Yes looks like it: https://sems-cdash-son.sandia.gov/cdash/test/3060119 We should probably look into why that's happening. The final result looks correct though.
@jhux2 any updates on this?
@mperego Sorry, I've not looked at this in a while. I'll pick this back up.
@mperego I updated your branch with master and am seeing the following error. Has parsing of ice_thickness changed somehow?
180: ***************************************************************
180: ** ______ __ ______ ______ __ __ __ __ **
180: ** /\ __ \ /\ \ /\ == \ /\ __ \ /\ "-.\ \ /\ \_\ \ **
180: ** \ \ __ \\ \ \____\ \ __< \ \ __ \\ \ \-. \\ \____ \ **
180: ** \ \_\ \_\\ \_____\\ \_____\\ \_\ \_\\ \_\\"\_\\/\_____\ **
180: ** \/_/\/_/ \/_____/ \/_____/ \/_/\/_/ \/_/ \/_/ \/_____/ **
180: ** **
180: ***************************************************************
180: ** Trilinos git commit id - 62bb6ac4a8e
180: ** Albany git branch ------ enthalpy_muelu
180: ** Albany git commit id --- 75e0b13ba
180: ** Albany cxx compiler ---- GNU 10.1.0
180: ** Albany FadType --------- DFad
180: ** Albany TanFadType ------ DFad
180: ** Albany HessianVecFad -- DFad
180: ** Simulation start time -- 2023-02-06 at 14:10:52
180: ***************************************************************
180:
180: p=1: *** Caught standard std::exception of type 'Teuchos::Exceptions::InvalidParameterName' :
180:
180: Error, the parameter {name="Required Fields",type="Array(string)",value="{ice_thickness}"}
180: in the parameter (sub)list "Albany Parameters->Problem"
180: was not found in the list of valid parameters!
180:
180: The valid parameters and types are:
180: {
180: "Name" : string =
180: "Number of Spatial Processors" : int = -1
180: "Phalanx Graph Visualization Detail" : int = 0
180: "Use Physics-Based Preconditioner" : bool = 0
180: "Physics-Based Preconditioner" : string = None
180: "Initial Condition" : ParameterList = ...
180: "Initial Condition Dot" : ParameterList = ...
180: "Initial Condition DotDot" : ParameterList = ...
180: "Source Functions" : ParameterList = ...
180: "Absorption" : ParameterList = ...
180: "Response Functions" : ParameterList = ...
180: "Parameters" : ParameterList = ...
180: "Random Parameters" : ParameterList = ...
180: "Linear Combination Parameters" : ParameterList = ...
180: "LogNormal Parameter" : ParameterList = ...
180: "Teko" : ParameterList = ...
180: "Hessian" : ParameterList = ...
180: "XFEM" : ParameterList = ...
180: "Dirichlet BCs" : ParameterList = ...
180: "Neumann BCs" : ParameterList = ...
180: "Adaptation" : ParameterList = ...
180: "Overwrite Nominal Values With Final Point" : bool = 0
180: "Number Of Time Derivatives" : int = 1
180: "Use MDField Memoization" : bool = 0
180: "Use MDField Memoization For Parameters" : bool = 0
180: "Ignore Residual In Jacobian" : bool = 0
180: "Perturb Dirichlet" : double = 0
180: "Solution Method" : string = Steady
180: "Homotopy Restart Step" : double = 1
180: "Second Order" : string = No
180: "Print Response Expansion" : bool = 1
180: "Compute Sensitivities" : bool = 1
180: "Constitutive Model NOX Status Test" : Teuchos::RCP<NOX::StatusTest::Generic> = Teuchos::RCP<NOX::StatusTest::Generic>{ptr=0,node=0,strong_count=0,weak_count=0}
180: "LandIce Physical Parameters" : ParameterList = ...
180: "LandIce Enthalpy" : ParameterList = ...
180: "LandIce Viscosity" : ParameterList = ...
180: "Stereographic Map" : ParameterList = ...
180: "Basal Side Name" : string =
180: "Needs Dissipation" : bool = 1
180: "Needs Basal Friction" : bool = 1
180: }
180:
180:
180: Throw number = 1
180:
@jhux2, we cleaned a bit the code. Please remove these lines:
Required Fields: [ice_thickness]
Required Basal Fields: [ice_thickness]
Element Shape: Wedge
Thanks, @mperego. Another error, I guess masked by the first:
Start 180: landIce_Enthalpy_Humboldt_MueLu_P1semiR1transP2const
180: Test command: /projects/sems/install/rhel7-x86_64/sems/v2/tpl/openmpi/4.0.5/gcc/10.1.0/base/e64jpaw/bin/mpiexec "-np" "4" "/scratch/jhu/fanssie/build-albany-relwithdebinfo/src/Albany" "input_enthalpy_humboldt_muelu_P1semiR1transP2const.yaml"
180: Working Directory: /scratch/jhu/fanssie/build-albany-relwithdebinfo/tests/landIce/Enthalpy
180: Test timeout computed to be: 1500
180: ***************************************************************
180: ** ______ __ ______ ______ __ __ __ __ **
180: ** /\ __ \ /\ \ /\ == \ /\ __ \ /\ "-.\ \ /\ \_\ \ **
180: ** \ \ __ \\ \ \____\ \ __< \ \ __ \\ \ \-. \\ \____ \ **
180: ** \ \_\ \_\\ \_____\\ \_____\\ \_\ \_\\ \_\\"\_\\/\_____\ **
180: ** \/_/\/_/ \/_____/ \/_____/ \/_/\/_/ \/_/ \/_/ \/_____/ **
180: ** **
180: ***************************************************************
180: ** Trilinos git commit id - 62bb6ac4a8e
180: ** Albany git branch ------ enthalpy_muelu
180: ** Albany git commit id --- 75e0b13ba
180: ** Albany cxx compiler ---- GNU 10.1.0
180: ** Albany FadType --------- DFad
180: ** Albany TanFadType ------ DFad
180: ** Albany HessianVecFad -- DFad
180: ** Simulation start time -- 2023-02-06 at 14:31:21
180: ***************************************************************
180: Albany_IOSS: Loading STKMesh from Exodus file ../AsciiMeshes/Humboldt/humboldt_2d.exo
180:
180: IOSS: Using decomposition method 'RIB' for 2,611 elements on 4 mpi ranks.
180:
180: p=3: *** Caught standard std::exception of type 'Teuchos::Exceptions::InvalidParameterValue' :
180:
180: /ascldap/users/jhu/fanssie/sources/Albany/src/disc/stk/Albany_ExtrudedSTKMeshStruct.cpp:136:
180:
180: Throw number = 1
180:
180: Throw test that evaluated to true: basalside_elem_name != elem2d_name
180:
180:
180: Error in ExtrudedSTKMeshStruct: Expecting topology name of elements of 2d mesh to be Quadrilateral_4 but it is Triangle_3
@jhux2 I guess you merged with master before #888 got merged. If so, you need to put back
Element Shape: Wedge
Let me know if this is not the issue
@mperego That seems to have fixed it, I'm now back to the original error you reported. Thanks.
@mperego Here's a quick update. MueLu's setup is recursing until it exhausts stack memory, and one of the processes seg faults. I'm sifting through factory dependency information at the moment to see what's going wrong.
@jhux2 thanks for looking into that! It doesn't sound fun..
@jhux2 are there any updates on this issue?
Hi @jhux, there have been some changes in Albany that needs to be merged in this branch. A few additional changes are needed in the input files as well. Let me know when you plan to look into this and I'll do the merge and fix the input files.