pylith
pylith copied to clipboard
Resolve issues related to variable point-block Jacobi for fault preconditioner
- [x] GAMG setting (
--petsc.mg_levels_3_pc_type=vpbjacobi
) is basis function dependent; need to be able to set this automatically - [x] Turning on refinement results in SNES iterations for a linear problem; error in Jacobian?
- [x] Need to unpermute values for output
- [x] Update parallel fault PC settings to use parmetis with edge weighting [needed until we have parallel mesh loading]
- [x] DOCS: Update example output
- [x] DOCS: Update list of fault PC settings [parmetis with edge weighting; temporary]
- [x] FIX: Error when running
examples/reverse-2d/step06_twofaults_elastic.cfg
with--nodes=2
(Error with DS:Field 1 is not in [0, 1)
)
Description
I pulled PETSc and now SNES fails to converge.
How to reproduce
PETSc branch: knepley/fix-plex-io-section-perm
PyLith fork: https://github.com/baagaard-usgs/pylith.git
PyLith branch: improve-elasticity-fault-pc
cd $BUILD/tests/fullscale/linearelasticity/faults-2d
make export-data
pylith shearnoslip.cfg shearnoslip_quad.cfg ~/src/cig/pylith/share/settings/solver_faultpc.cfg --petsc.ksp_monitor_true_residual --petsc.snes_monitor
pylith shearnoslip.cfg shearnoslip_quad.cfg ~/src/cig/pylith/share/settings/solver_faultpc.cfg --petsc.ksp_monitor_true_residual --petsc.snes_monitor
0 SNES Function norm 4.265560370677e-03
0 KSP preconditioned resid norm 1.908036347910e-02 true resid norm 4.265560370677e-03 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 4.967562491477e-03 true resid norm 3.077158244315e-03 ||r(i)||/||b|| 7.213960129290e-01
2 KSP preconditioned resid norm 3.598311696003e-03 true resid norm 2.619025748517e-03 ||r(i)||/||b|| 6.139933609945e-01
3 KSP preconditioned resid norm 1.430059417360e-03 true resid norm 1.704501179051e-03 ||r(i)||/||b|| 3.995960743560e-01
4 KSP preconditioned resid norm 7.351269360768e-04 true resid norm 1.225965715858e-03 ||r(i)||/||b|| 2.874102367150e-01
5 KSP preconditioned resid norm 7.230122730906e-04 true resid norm 1.290372237623e-03 ||r(i)||/||b|| 3.025094302951e-01
6 KSP preconditioned resid norm 4.159075140772e-04 true resid norm 6.686181343840e-04 ||r(i)||/||b|| 1.567480181456e-01
7 KSP preconditioned resid norm 1.367757561316e-04 true resid norm 4.039493407656e-04 ||r(i)||/||b|| 9.470018137417e-02
8 KSP preconditioned resid norm 7.158193483169e-05 true resid norm 1.827279480449e-04 ||r(i)||/||b|| 4.283797019989e-02
9 KSP preconditioned resid norm 3.905357446809e-05 true resid norm 7.282293839440e-05 ||r(i)||/||b|| 1.707230283154e-02
10 KSP preconditioned resid norm 9.668164625721e-06 true resid norm 1.060892638182e-05 ||r(i)||/||b|| 2.487111999340e-03
11 KSP preconditioned resid norm 1.785936355636e-06 true resid norm 1.877012745564e-06 ||r(i)||/||b|| 4.400389591171e-04
12 KSP preconditioned resid norm 3.965116756948e-07 true resid norm 5.489373224082e-07 ||r(i)||/||b|| 1.286905528713e-04
13 KSP preconditioned resid norm 9.375386755773e-08 true resid norm 1.242938071586e-07 ||r(i)||/||b|| 2.913891642773e-05
14 KSP preconditioned resid norm 1.901533854471e-08 true resid norm 1.973504391365e-08 ||r(i)||/||b|| 4.626600539829e-06
15 KSP preconditioned resid norm 2.383816936460e-09 true resid norm 3.629644757516e-09 ||r(i)||/||b|| 8.509186231351e-07
16 KSP preconditioned resid norm 4.598716317560e-10 true resid norm 5.357437037358e-10 ||r(i)||/||b|| 1.255974965021e-07
17 KSP preconditioned resid norm 6.822558674688e-11 true resid norm 1.035253125880e-10 ||r(i)||/||b|| 2.427003807043e-08
18 KSP preconditioned resid norm 1.714498392166e-11 true resid norm 2.063303492863e-11 ||r(i)||/||b|| 4.837121769619e-09
19 KSP preconditioned resid norm 2.318581686522e-12 true resid norm 2.023190303223e-12 ||r(i)||/||b|| 4.743082098032e-10
20 KSP preconditioned resid norm 2.531897993993e-13 true resid norm 3.444310532404e-13 ||r(i)||/||b|| 8.074696483214e-11
21 KSP preconditioned resid norm 4.083948365503e-14 true resid norm 4.900041069684e-14 ||r(i)||/||b|| 1.148744981637e-11
22 KSP preconditioned resid norm 5.832103009695e-15 true resid norm 7.129245206412e-15 ||r(i)||/||b|| 1.671350206510e-12
23 KSP preconditioned resid norm 7.275292514621e-16 true resid norm 1.083637092917e-15 ||r(i)||/||b|| 2.540433140664e-13
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: SNESSolve has not converged due to DIVERGED_LINE_SEARCH. Suggest running with -snes_linesearch_monitor
@baagaard-usgs I think this is fixed. I have force pushed the PETSc branch. The problem was that the residual is cloned from the solution, and clone did not copy the rordering flags. Ugh. Putting in a new flag is hard. Let me know if this works for you.
The residual and Jacobian are now consistent. The output is still scrambled.
How do I see the output from the minmesh example I am running?
pylith pylithapp_m
inmesh.cfg minmesh_noslip.cfg minmesh_noslip_quad.cfg ../../../../../pylith-git/share/settings/solver_faultpc.cfg --pets
c.snes_monitor --petsc.ksp_monitor_true_residual --petsc.snes_view --petsc.dm_ignore_perm_output
The Lagrange multipliers should be nearly 0 and the displacements should +1 and -1 on each side of the fault.
h5dump output/minmesh_noslip_quad-domain.h5
Only the displacement outputs for me. These look like they are in the original order
GROUP "vertex_fields" {
DATASET "displacement" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 1, 8, 2 ) / ( H5S_UNLIMITED, 8, 2 ) }
DATA {
(0,0,0): 0, 1,
(0,1,0): 8.87995e-17, -1.52413e-16,
(0,2,0): 7.86351e-17, 1.72848e-16,
(0,3,0): -8.41938e-17, -1.52837e-16,
(0,4,0): -9.77463e-17, 1.47278e-16,
(0,5,0): 0, 1,
(0,6,0): 0, -1,
(0,7,0): 0, -1
}
ATTRIBUTE "timestepping" {
DATATYPE H5T_STD_I32LE
DATASPACE SCALAR
DATA {
(0): 1
}
}
ATTRIBUTE "vector_field_type" {
DATATYPE H5T_STRING {
STRSIZE 7;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "vector"
}
}
}
}
The displacement solution field should be
u(x=-2000) = (0, 1) u(x=0) = (0, 0) u(x=+2000) = (0, -1)
Coordinates of vertices Displacement
(0,0): -2000, 0, (0,0,0): 0, 1, Correct
(1,0): 0, 0, (0,1,0): -1.3022e-17, 2.31458e-16, Correct
(2,0): 0, 2000, (0,2,0): -1.75285e-17, -5.84085e-17, Correct
(3,0): -2000, 2000, (0,3,0): 1.83402e-17, 1.51906e-16, Should be (0, 1)
(4,0): 2000, 0, (0,4,0): 3.69749e-17, -1.26146e-16, Should be (0, -1)
(5,0): 2000, 2000, (0,5,0): 0, 1, Should be (0, -1)
(6,0): 0, 0, (0,6,0): 0, -1, Should be (0, 0)
(7,0): 0, 2000 (0,7,0): 0, -1 Should be (0, 0)
I have a fix. However, when I rebuilt PyLith everything started to fail, and I cannot understand why. Here is the beginning of the error messages
>> ./pylithapp.cfg:62:
-- pyre.inventory(error)
-- timedependent.solution_observers.points <- 'pylith.meshio.OutputSolnPoints'
-- unrecognized property 'timedependent.solution_observers.points'
>> ./pylithapp.cfg:95:
-- pyre.inventory(error)
-- pylithapp.timedependent.materials.mat_xmid.description <- 'Elastic material between xneg and xmid faults'
-- unknown component 'pylithapp.timedependent.materials.mat_xmid'
Okay, I resolved that. My fix appears to work, but I cannot push to your fork. Here is my diff:
diff --git a/libsrc/pylith/meshio/OutputSubfield.cc b/libsrc/pylith/meshio/OutputSubfield.cc
index acacf4ebb..574a12a18 100644
--- a/libsrc/pylith/meshio/OutputSubfield.cc
+++ b/libsrc/pylith/meshio/OutputSubfield.cc
@@ -22,6 +22,8 @@
#include <typeinfo> // USES typeid()
+#include "petscdm.h" // USES PetscDM
+
// ------------------------------------------------------------------------------------------------
// Constructor
pylith::meshio::OutputSubfield::OutputSubfield(void) :
@@ -72,6 +74,8 @@ pylith::meshio::OutputSubfield::create(const pylith::topology::Field& field,
PetscErrorCode err;
err = DMClone(mesh.getDM(), &subfield->_dm);PYLITH_CHECK_ERROR(err);
+ err = DMReorderSectionSetDefault(subfield->_dm, DM_REORDER_DEFAULT_FALSE);PYLITH_CHECK_ERROR(err);
+ err = DMReorderSectionSetType(subfield->_dm, NULL);PYLITH_CHECK_ERROR(err);
err = PetscObjectSetName((PetscObject)subfield->_dm, name);PYLITH_CHECK_ERROR(err);
PetscFE fe = pylith::topology::FieldOps::createFE(subfield->_discretization, subfield->_dm,
Do you have to give me permission on your fork?
Yes, I believe this is the fix. The fault full-scale tests pass with this fix. I will do more testing to verify completeness.
What is the prospect of getting the mg levels set according to the problem? If we are able to do that, we might have this issue completely wrapped up.
mg_levels_1_pc_type = vpbjacobi
I get a zero pivot in row 0 when I use a basis order of 2. I thought you had fixed the reordering for basis order 2.
PETSc branch: knepley/fix-plex-io-section-perm
PyLith fork: https://github.com/baagaard-usgs/pylith.git
PyLith branch: improve-elasticity-fault-pc
How to reproduce
cd examples/strikeslip-2d
pylith step04_varslip.cfg ../../share/settings/solver_faultpc.cfg --petsc.mg_levels_2_pc_type=vpbjacobi
Error message
[0]PETSC ERROR: Zero pivot in LU factorization: https://petsc.org/release/faq/#zeropivot
[0]PETSC ERROR: Zero pivot, row 0
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR: Option left: name:-fieldsplit_displacement_ksp_type value: preonly source: code
[0]PETSC ERROR: Option left: name:-fieldsplit_displacement_pc_type value: lu source: code
[0]PETSC ERROR: Option left: name:-fieldsplit_lagrange_multiplier_fault_ksp_type value: preonly source: code
[0]PETSC ERROR: Option left: name:-fieldsplit_lagrange_multiplier_fault_pc_type value: lu source: code
[0]PETSC ERROR: Option left: name:-ksp_converged_reason (no value) source: code
[0]PETSC ERROR: Option left: name:-pc_fieldsplit_schur_factorization_type value: lower source: code
[0]PETSC ERROR: Option left: name:-pc_fieldsplit_schur_precondition value: selfp source: code
[0]PETSC ERROR: Option left: name:-pc_fieldsplit_schur_scale value: 1.0 source: code
[0]PETSC ERROR: Option left: name:-pc_fieldsplit_type value: schur source: code
[0]PETSC ERROR: Option left: name:-snes_converged_reason (no value) source: code
-- snip --
[0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_2() at /Users/baagaard/software/unix/petsc-dev/src/mat/impls/baij/seq/dgefa2.c:52
[0]PETSC ERROR: #2 MatInvertVariableBlockDiagonal_SeqAIJ() at /Users/baagaard/software/unix/petsc-dev/src/mat/impls/aij/seq/aij.c:1840
[0]PETSC ERROR: #3 MatInvertVariableBlockDiagonal() at /Users/baagaard/software/unix/petsc-dev/src/mat/interface/matrix.c:10657
[0]PETSC ERROR: #4 PCSetUp_VPBJacobi_Host() at /Users/baagaard/software/unix/petsc-dev/src/ksp/pc/impls/vpbjacobi/vpbjacobi.c:232
[0]PETSC ERROR: #5 PCSetUp_VPBJacobi() at /Users/baagaard/software/unix/petsc-dev/src/ksp/pc/impls/vpbjacobi/vpbjacobi.c:263
[0]PETSC ERROR: #6 PCSetUp() at /Users/baagaard/software/unix/petsc-dev/src/ksp/pc/interface/precon.c:1079
[0]PETSC ERROR: #7 KSPSetUp() at /Users/baagaard/software/unix/petsc-dev/src/ksp/ksp/interface/itfunc.c:415
[0]PETSC ERROR: #8 KSPSolve_Private() at /Users/baagaard/software/unix/petsc-dev/src/ksp/ksp/interface/itfunc.c:832
@knepley mg_fine* seems to work. I am still getting this error above with basis order 2.
The basis order 2 fix is in knepley/fix-plex-reorder-hybrid Can you approve https://gitlab.com/petsc/petsc/-/merge_requests/7258 ? Then I can merge that, merge the above fix, and recreate knepley/pylith
I confimed that combining PETSc branches knepley/fix-plex-reorder-hybrid
and knepley/fix-plex-io-section-perm
resolves the issue with basis order 2.
Will close this issue once PETSc branchknepley/pylith
is updated.
@knepley It looks like we have an issue with using the new fault preconditioner settings in parallel in 3D.
Error message
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[0]PETSC ERROR: --------------------- Stack Frames ------------------------------------
[0]PETSC ERROR: The line numbers in the error traceback are not always exact.
[0]PETSC ERROR: #1 PetscTrFreeDefault() at /Users/baagaard/software/unix/petsc-dev/src/sys/memory/mtr.c:262
[0]PETSC ERROR: #2 DMCreateMatrix_Plex() at /Users/baagaard/software/unix/petsc-dev/src/dm/impls/plex/plex.c:2869
[0]PETSC ERROR: #3 DMCreateMatrix() at /Users/baagaard/software/unix/petsc-dev/src/dm/interface/dm.c:1478
[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: Nonconforming object sizes
[1]PETSC ERROR: Sum of local block sizes 405 does not equal local size of matrix 450
[1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[1]PETSC ERROR: Option left: name:-malloc_dump (no value) source: code
[1]PETSC ERROR: Option left: name:-mg_fine_pc_type value: vpbjacobi source: code
[1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[1]PETSC ERROR: Petsc Development GIT revision: v3.20.2-242-gc574b33dbb6 GIT Date: 2023-12-05 20:46:15 +0000
[1]PETSC ERROR: /Users/baagaard/software/unix/py3.12-venv/pylith-debug/bin/mpinemesis on a arch-clang-15.0_debug named IGSKCI164LM006 by baagaard Tue Feb 20 13:59:48 2024
[1]PETSC ERROR: Configure options --PETSC_ARCH=arch-clang-15.0_debug --with-debugging=1 --with-clanguage=c --with-mpi-compilers=1 --with-shared-[0]PETSC ERROR: #4 SNESSetUpMatrices() at /Users/baagaard/soft
ware/unix/petsc-dev/src/snes/interface/snes.c:768
[0]PETSC ERROR: #5 SNESSetUp_NEWTONLS() at /Users/baagaard/software/unix/petsc-dev/src/snes/impls/ls/ls.c:289
[0]PETSC ERROR: #6 SNESSetUp() at /Users/baagaard/software/unix/petsc-dev/src/snes/interface/snes.c:3297
[0]PETSC ERROR: #7 SNESSolve() at /Users/baagaard/software/unix/petsc-dev/src/snes/interface/snes.c:4718
[0]PETSC ERROR: #8 TSTheta_SNESSolve() at /Users/baagaard/software/unix/petsc-dev/src/ts/impls/implicit/theta/theta.c:174
[0]PETSC ERROR: #9 TSStep_Theta() at /Users/baagaard/software/unix/petsc-dev/src/ts/impls/implicit/theta/theta.c:224
[0]PETSC ERROR: #10 TSStep() at /Users/baagaard/software/unix/petsc-dev/src/ts/interface/ts.c:3382
[0]PETSC ERROR: #11 TSSolve() at /Users/baagaard/software/unix/petsc-dev/src/ts/interface/ts.c:4016
Steps to reproduce
PETSc branch: main
PyLith fork: https://github.com/baagaard-usgs/pylith.git
PyLith branch: improve-elasticity-fault-pc
cd $BUILD/tests/fullscale/linearelasticityfaults-3d
make export-data
pylith twoblocks.cfg twoblocks_hex.cfg --nodes=2
This is not limited to 3D. It also shows up in 2D, but depends on the number of processes.
The new knepley/pylith
should fix this if you run using --petsc.petscpartitioner_type=parmetis --petsc.petscpartitioner_use_edge_weights
. This causes partitioning to avoid breaking the domain along a cohesive cell. When this happens, we cannot solve the two sides as a block, and it really hurts the solver.
@knepley Two faults that intersection (one through-going and one that ends at the intersection) seems to cause an error when permuting the section.
[0]PETSC ERROR: Arguments are incompatible
[0]PETSC ERROR: Number of points in permutation 9622 does not match chart size 9623
--- snip --
[0]PETSC ERROR: Petsc Development GIT revision: v3.20.2-242-gc574b33dbb6 GIT Date: 2023-12-05 20:46:15 +0000
[0]PETSC ERROR: /Users/baagaard/software/unix/py3.12-venv/pylith-debug/bin/mpinemesis on a arch-clang-15.0_debug named IGSKCI164LM006 by baagaard Thu Mar 21 13:58:21 2024
-- snip --
[0]PETSC ERROR: #1 DMCreateSectionPermutation_Plex_Cohesive() at /Users/baagaard/software/unix/petsc-dev/src/dm/impls/plex/plexreorder.c:504
[0]PETSC ERROR: #2 DMCreateSectionPermutation_Plex() at /Users/baagaard/software/unix/petsc-dev/src/dm/impls/plex/plexreorder.c:526
[0]PETSC ERROR: #3 DMCreateSectionPermutation() at /Users/baagaard/software/unix/petsc-dev/src/dm/interface/dm.c:4392
[0]PETSC ERROR: #4 DMCreateLocalSection_Plex() at /Users/baagaard/software/unix/petsc-dev/src/dm/impls/plex/plexsection.c:611
[0]PETSC ERROR: #5 DMGetLocalSection() at /Users/baagaard/software/unix/petsc-dev/src/dm/interface/dm.c:4292
[0]PETSC ERROR: #6 DMGetSection() at /Users/baagaard/software/unix/petsc-dev/src/dm/interface/dm.c:4243
[0]PETSC ERROR: #7 void pylith::topology::Field::allocate()() at /Users/baagaard/src/cig/pylith/libsrc/pylith/topology/Field.cc:303
How to reproduce
PETSc branch: knepley/feature-partitioner-edge-weights
PyLIth fork: https://github.com/baagaard-usgs/pylith.git
PyLith branch: improve-elasticity-fault-pc
cd examples/reverse-2d
pylith step06_twofaults_elastic.cfg
@baagaard-usgs I believe this is fixed in https://gitlab.com/petsc/petsc/-/merge_requests/7417
Fix verified.