pylith icon indicating copy to clipboard operation
pylith copied to clipboard

Resolve issues related to variable point-block Jacobi for fault preconditioner

Open baagaard-usgs opened this issue 1 year ago • 20 comments

  • [x] GAMG setting (--petsc.mg_levels_3_pc_type=vpbjacobi) is basis function dependent; need to be able to set this automatically
  • [x] Turning on refinement results in SNES iterations for a linear problem; error in Jacobian?
  • [x] Need to unpermute values for output
  • [x] Update parallel fault PC settings to use parmetis with edge weighting [needed until we have parallel mesh loading]
  • [x] DOCS: Update example output
  • [x] DOCS: Update list of fault PC settings [parmetis with edge weighting; temporary]
  • [x] FIX: Error when running examples/reverse-2d/step06_twofaults_elastic.cfg with --nodes=2 (Error with DS: Field 1 is not in [0, 1))

baagaard-usgs avatar Jan 24 '24 15:01 baagaard-usgs

Description

I pulled PETSc and now SNES fails to converge.

How to reproduce

PETSc branch: knepley/fix-plex-io-section-perm PyLith fork: https://github.com/baagaard-usgs/pylith.git PyLith branch: improve-elasticity-fault-pc

cd $BUILD/tests/fullscale/linearelasticity/faults-2d
make export-data
pylith shearnoslip.cfg shearnoslip_quad.cfg ~/src/cig/pylith/share/settings/solver_faultpc.cfg --petsc.ksp_monitor_true_residual --petsc.snes_monitor
pylith shearnoslip.cfg shearnoslip_quad.cfg ~/src/cig/pylith/share/settings/solver_faultpc.cfg --petsc.ksp_monitor_true_residual --petsc.snes_monitor
    0 SNES Function norm 4.265560370677e-03
      0 KSP preconditioned resid norm 1.908036347910e-02 true resid norm 4.265560370677e-03 ||r(i)||/||b|| 1.000000000000e+00
      1 KSP preconditioned resid norm 4.967562491477e-03 true resid norm 3.077158244315e-03 ||r(i)||/||b|| 7.213960129290e-01
      2 KSP preconditioned resid norm 3.598311696003e-03 true resid norm 2.619025748517e-03 ||r(i)||/||b|| 6.139933609945e-01
      3 KSP preconditioned resid norm 1.430059417360e-03 true resid norm 1.704501179051e-03 ||r(i)||/||b|| 3.995960743560e-01
      4 KSP preconditioned resid norm 7.351269360768e-04 true resid norm 1.225965715858e-03 ||r(i)||/||b|| 2.874102367150e-01
      5 KSP preconditioned resid norm 7.230122730906e-04 true resid norm 1.290372237623e-03 ||r(i)||/||b|| 3.025094302951e-01
      6 KSP preconditioned resid norm 4.159075140772e-04 true resid norm 6.686181343840e-04 ||r(i)||/||b|| 1.567480181456e-01
      7 KSP preconditioned resid norm 1.367757561316e-04 true resid norm 4.039493407656e-04 ||r(i)||/||b|| 9.470018137417e-02
      8 KSP preconditioned resid norm 7.158193483169e-05 true resid norm 1.827279480449e-04 ||r(i)||/||b|| 4.283797019989e-02
      9 KSP preconditioned resid norm 3.905357446809e-05 true resid norm 7.282293839440e-05 ||r(i)||/||b|| 1.707230283154e-02
     10 KSP preconditioned resid norm 9.668164625721e-06 true resid norm 1.060892638182e-05 ||r(i)||/||b|| 2.487111999340e-03
     11 KSP preconditioned resid norm 1.785936355636e-06 true resid norm 1.877012745564e-06 ||r(i)||/||b|| 4.400389591171e-04
     12 KSP preconditioned resid norm 3.965116756948e-07 true resid norm 5.489373224082e-07 ||r(i)||/||b|| 1.286905528713e-04
     13 KSP preconditioned resid norm 9.375386755773e-08 true resid norm 1.242938071586e-07 ||r(i)||/||b|| 2.913891642773e-05
     14 KSP preconditioned resid norm 1.901533854471e-08 true resid norm 1.973504391365e-08 ||r(i)||/||b|| 4.626600539829e-06
     15 KSP preconditioned resid norm 2.383816936460e-09 true resid norm 3.629644757516e-09 ||r(i)||/||b|| 8.509186231351e-07
     16 KSP preconditioned resid norm 4.598716317560e-10 true resid norm 5.357437037358e-10 ||r(i)||/||b|| 1.255974965021e-07
     17 KSP preconditioned resid norm 6.822558674688e-11 true resid norm 1.035253125880e-10 ||r(i)||/||b|| 2.427003807043e-08
     18 KSP preconditioned resid norm 1.714498392166e-11 true resid norm 2.063303492863e-11 ||r(i)||/||b|| 4.837121769619e-09
     19 KSP preconditioned resid norm 2.318581686522e-12 true resid norm 2.023190303223e-12 ||r(i)||/||b|| 4.743082098032e-10
     20 KSP preconditioned resid norm 2.531897993993e-13 true resid norm 3.444310532404e-13 ||r(i)||/||b|| 8.074696483214e-11
     21 KSP preconditioned resid norm 4.083948365503e-14 true resid norm 4.900041069684e-14 ||r(i)||/||b|| 1.148744981637e-11
     22 KSP preconditioned resid norm 5.832103009695e-15 true resid norm 7.129245206412e-15 ||r(i)||/||b|| 1.671350206510e-12
     23 KSP preconditioned resid norm 7.275292514621e-16 true resid norm 1.083637092917e-15 ||r(i)||/||b|| 2.540433140664e-13
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: SNESSolve has not converged due to DIVERGED_LINE_SEARCH. Suggest running with -snes_linesearch_monitor

baagaard-usgs avatar Feb 04 '24 18:02 baagaard-usgs

@baagaard-usgs I think this is fixed. I have force pushed the PETSc branch. The problem was that the residual is cloned from the solution, and clone did not copy the rordering flags. Ugh. Putting in a new flag is hard. Let me know if this works for you.

knepley avatar Feb 11 '24 16:02 knepley

The residual and Jacobian are now consistent. The output is still scrambled.

baagaard-usgs avatar Feb 11 '24 20:02 baagaard-usgs

How do I see the output from the minmesh example I am running?

pylith pylithapp_m
inmesh.cfg minmesh_noslip.cfg minmesh_noslip_quad.cfg ../../../../../pylith-git/share/settings/solver_faultpc.cfg --pets
c.snes_monitor --petsc.ksp_monitor_true_residual --petsc.snes_view --petsc.dm_ignore_perm_output

knepley avatar Feb 11 '24 20:02 knepley

The Lagrange multipliers should be nearly 0 and the displacements should +1 and -1 on each side of the fault.

h5dump output/minmesh_noslip_quad-domain.h5

baagaard-usgs avatar Feb 11 '24 20:02 baagaard-usgs

Only the displacement outputs for me. These look like they are in the original order

   GROUP "vertex_fields" {
      DATASET "displacement" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 1, 8, 2 ) / ( H5S_UNLIMITED, 8, 2 ) }
         DATA {
         (0,0,0): 0, 1,
         (0,1,0): 8.87995e-17, -1.52413e-16,
         (0,2,0): 7.86351e-17, 1.72848e-16,
         (0,3,0): -8.41938e-17, -1.52837e-16,
         (0,4,0): -9.77463e-17, 1.47278e-16,
         (0,5,0): 0, 1,

         (0,6,0): 0, -1,
         (0,7,0): 0, -1
         }
         ATTRIBUTE "timestepping" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SCALAR
            DATA {
            (0): 1
            }
         }
         ATTRIBUTE "vector_field_type" {
            DATATYPE  H5T_STRING {
               STRSIZE 7;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "vector"
            }
         }
      }
   }

knepley avatar Feb 11 '24 20:02 knepley

The displacement solution field should be

u(x=-2000) = (0, 1) u(x=0) = (0, 0) u(x=+2000) = (0, -1)

Coordinates of vertices               Displacement
(0,0): -2000, 0,                             (0,0,0): 0, 1,                                                  Correct
(1,0): 0, 0,                                      (0,1,0): -1.3022e-17, 2.31458e-16,            Correct
(2,0): 0, 2000,                               (0,2,0): -1.75285e-17, -5.84085e-17,       Correct
(3,0): -2000, 2000,                      (0,3,0): 1.83402e-17, 1.51906e-16,           Should be (0, 1)
(4,0): 2000, 0,                               (0,4,0): 3.69749e-17, -1.26146e-16,        Should be (0, -1)
(5,0): 2000, 2000,                        (0,5,0): 0, 1,                                                Should be (0, -1)
(6,0): 0, 0,                                      (0,6,0): 0, -1,                                              Should be (0, 0)
(7,0): 0, 2000                                 (0,7,0): 0, -1                                               Should be (0, 0)
         
         
         
         
         
         
         
         

baagaard-usgs avatar Feb 12 '24 03:02 baagaard-usgs

I have a fix. However, when I rebuilt PyLith everything started to fail, and I cannot understand why. Here is the beginning of the error messages

 >> ./pylithapp.cfg:62:
 -- pyre.inventory(error)
 -- timedependent.solution_observers.points <- 'pylith.meshio.OutputSolnPoints'
 -- unrecognized property 'timedependent.solution_observers.points'
 >> ./pylithapp.cfg:95:
 -- pyre.inventory(error)
 -- pylithapp.timedependent.materials.mat_xmid.description <- 'Elastic material between xneg and xmid faults'
 -- unknown component 'pylithapp.timedependent.materials.mat_xmid'

knepley avatar Feb 12 '24 14:02 knepley

Okay, I resolved that. My fix appears to work, but I cannot push to your fork. Here is my diff:

diff --git a/libsrc/pylith/meshio/OutputSubfield.cc b/libsrc/pylith/meshio/OutputSubfield.cc
index acacf4ebb..574a12a18 100644
--- a/libsrc/pylith/meshio/OutputSubfield.cc
+++ b/libsrc/pylith/meshio/OutputSubfield.cc

@@ -22,6 +22,8 @@

 #include <typeinfo> // USES typeid()

+#include "petscdm.h" // USES PetscDM
+
 // ------------------------------------------------------------------------------------------------
 // Constructor
 pylith::meshio::OutputSubfield::OutputSubfield(void) :
@@ -72,6 +74,8 @@ pylith::meshio::OutputSubfield::create(const pylith::topology::Field& field,

     PetscErrorCode err;
     err = DMClone(mesh.getDM(), &subfield->_dm);PYLITH_CHECK_ERROR(err);
+    err = DMReorderSectionSetDefault(subfield->_dm, DM_REORDER_DEFAULT_FALSE);PYLITH_CHECK_ERROR(err);
+    err = DMReorderSectionSetType(subfield->_dm, NULL);PYLITH_CHECK_ERROR(err);
     err = PetscObjectSetName((PetscObject)subfield->_dm, name);PYLITH_CHECK_ERROR(err);

     PetscFE fe = pylith::topology::FieldOps::createFE(subfield->_discretization, subfield->_dm,

Do you have to give me permission on your fork?

knepley avatar Feb 12 '24 14:02 knepley

Yes, I believe this is the fix. The fault full-scale tests pass with this fix. I will do more testing to verify completeness.

What is the prospect of getting the mg levels set according to the problem? If we are able to do that, we might have this issue completely wrapped up.

mg_levels_1_pc_type = vpbjacobi

baagaard-usgs avatar Feb 13 '24 03:02 baagaard-usgs

I get a zero pivot in row 0 when I use a basis order of 2. I thought you had fixed the reordering for basis order 2.

PETSc branch: knepley/fix-plex-io-section-perm PyLith fork: https://github.com/baagaard-usgs/pylith.git PyLith branch: improve-elasticity-fault-pc

How to reproduce

cd examples/strikeslip-2d
pylith step04_varslip.cfg ../../share/settings/solver_faultpc.cfg --petsc.mg_levels_2_pc_type=vpbjacobi

Error message

[0]PETSC ERROR: Zero pivot in LU factorization: https://petsc.org/release/faq/#zeropivot
[0]PETSC ERROR: Zero pivot, row 0
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR:   Option left: name:-fieldsplit_displacement_ksp_type value: preonly source: code
[0]PETSC ERROR:   Option left: name:-fieldsplit_displacement_pc_type value: lu source: code
[0]PETSC ERROR:   Option left: name:-fieldsplit_lagrange_multiplier_fault_ksp_type value: preonly source: code
[0]PETSC ERROR:   Option left: name:-fieldsplit_lagrange_multiplier_fault_pc_type value: lu source: code
[0]PETSC ERROR:   Option left: name:-ksp_converged_reason (no value) source: code
[0]PETSC ERROR:   Option left: name:-pc_fieldsplit_schur_factorization_type value: lower source: code
[0]PETSC ERROR:   Option left: name:-pc_fieldsplit_schur_precondition value: selfp source: code
[0]PETSC ERROR:   Option left: name:-pc_fieldsplit_schur_scale value: 1.0 source: code
[0]PETSC ERROR:   Option left: name:-pc_fieldsplit_type value: schur source: code
[0]PETSC ERROR:   Option left: name:-snes_converged_reason (no value) source: code
-- snip --
[0]PETSC ERROR: #1 PetscKernel_A_gets_inverse_A_2() at /Users/baagaard/software/unix/petsc-dev/src/mat/impls/baij/seq/dgefa2.c:52
[0]PETSC ERROR: #2 MatInvertVariableBlockDiagonal_SeqAIJ() at /Users/baagaard/software/unix/petsc-dev/src/mat/impls/aij/seq/aij.c:1840
[0]PETSC ERROR: #3 MatInvertVariableBlockDiagonal() at /Users/baagaard/software/unix/petsc-dev/src/mat/interface/matrix.c:10657
[0]PETSC ERROR: #4 PCSetUp_VPBJacobi_Host() at /Users/baagaard/software/unix/petsc-dev/src/ksp/pc/impls/vpbjacobi/vpbjacobi.c:232
[0]PETSC ERROR: #5 PCSetUp_VPBJacobi() at /Users/baagaard/software/unix/petsc-dev/src/ksp/pc/impls/vpbjacobi/vpbjacobi.c:263
[0]PETSC ERROR: #6 PCSetUp() at /Users/baagaard/software/unix/petsc-dev/src/ksp/pc/interface/precon.c:1079
[0]PETSC ERROR: #7 KSPSetUp() at /Users/baagaard/software/unix/petsc-dev/src/ksp/ksp/interface/itfunc.c:415
[0]PETSC ERROR: #8 KSPSolve_Private() at /Users/baagaard/software/unix/petsc-dev/src/ksp/ksp/interface/itfunc.c:832

baagaard-usgs avatar Feb 14 '24 16:02 baagaard-usgs

@knepley mg_fine* seems to work. I am still getting this error above with basis order 2.

baagaard-usgs avatar Feb 15 '24 19:02 baagaard-usgs

The basis order 2 fix is in knepley/fix-plex-reorder-hybrid Can you approve https://gitlab.com/petsc/petsc/-/merge_requests/7258 ? Then I can merge that, merge the above fix, and recreate knepley/pylith

knepley avatar Feb 16 '24 12:02 knepley

I confimed that combining PETSc branches knepley/fix-plex-reorder-hybrid and knepley/fix-plex-io-section-perm resolves the issue with basis order 2.

Will close this issue once PETSc branchknepley/pylith is updated.

baagaard-usgs avatar Feb 16 '24 15:02 baagaard-usgs

@knepley It looks like we have an issue with using the new fault preconditioner settings in parallel in 3D.

Error message

[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[0]PETSC ERROR: The line numbers in the error traceback are not always exact.
[0]PETSC ERROR: #1 PetscTrFreeDefault() at /Users/baagaard/software/unix/petsc-dev/src/sys/memory/mtr.c:262
[0]PETSC ERROR: #2 DMCreateMatrix_Plex() at /Users/baagaard/software/unix/petsc-dev/src/dm/impls/plex/plex.c:2869
[0]PETSC ERROR: #3 DMCreateMatrix() at /Users/baagaard/software/unix/petsc-dev/src/dm/interface/dm.c:1478
[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: Nonconforming object sizes
[1]PETSC ERROR: Sum of local block sizes 405 does not equal local size of matrix 450
[1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[1]PETSC ERROR:   Option left: name:-malloc_dump (no value) source: code
[1]PETSC ERROR:   Option left: name:-mg_fine_pc_type value: vpbjacobi source: code
[1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[1]PETSC ERROR: Petsc Development GIT revision: v3.20.2-242-gc574b33dbb6  GIT Date: 2023-12-05 20:46:15 +0000
[1]PETSC ERROR: /Users/baagaard/software/unix/py3.12-venv/pylith-debug/bin/mpinemesis on a arch-clang-15.0_debug named IGSKCI164LM006 by baagaard Tue Feb 20 13:59:48 2024
[1]PETSC ERROR: Configure options --PETSC_ARCH=arch-clang-15.0_debug --with-debugging=1 --with-clanguage=c --with-mpi-compilers=1 --with-shared-[0]PETSC ERROR: #4 SNESSetUpMatrices() at /Users/baagaard/soft
ware/unix/petsc-dev/src/snes/interface/snes.c:768
[0]PETSC ERROR: #5 SNESSetUp_NEWTONLS() at /Users/baagaard/software/unix/petsc-dev/src/snes/impls/ls/ls.c:289
[0]PETSC ERROR: #6 SNESSetUp() at /Users/baagaard/software/unix/petsc-dev/src/snes/interface/snes.c:3297
[0]PETSC ERROR: #7 SNESSolve() at /Users/baagaard/software/unix/petsc-dev/src/snes/interface/snes.c:4718
[0]PETSC ERROR: #8 TSTheta_SNESSolve() at /Users/baagaard/software/unix/petsc-dev/src/ts/impls/implicit/theta/theta.c:174
[0]PETSC ERROR: #9 TSStep_Theta() at /Users/baagaard/software/unix/petsc-dev/src/ts/impls/implicit/theta/theta.c:224
[0]PETSC ERROR: #10 TSStep() at /Users/baagaard/software/unix/petsc-dev/src/ts/interface/ts.c:3382
[0]PETSC ERROR: #11 TSSolve() at /Users/baagaard/software/unix/petsc-dev/src/ts/interface/ts.c:4016

Steps to reproduce

PETSc branch: main PyLith fork: https://github.com/baagaard-usgs/pylith.git PyLith branch: improve-elasticity-fault-pc

cd $BUILD/tests/fullscale/linearelasticityfaults-3d
make export-data
pylith twoblocks.cfg twoblocks_hex.cfg --nodes=2

baagaard-usgs avatar Feb 20 '24 22:02 baagaard-usgs

This is not limited to 3D. It also shows up in 2D, but depends on the number of processes.

baagaard-usgs avatar Feb 28 '24 22:02 baagaard-usgs

The new knepley/pylith should fix this if you run using --petsc.petscpartitioner_type=parmetis --petsc.petscpartitioner_use_edge_weights. This causes partitioning to avoid breaking the domain along a cohesive cell. When this happens, we cannot solve the two sides as a block, and it really hurts the solver.

knepley avatar Mar 14 '24 10:03 knepley

@knepley Two faults that intersection (one through-going and one that ends at the intersection) seems to cause an error when permuting the section.

[0]PETSC ERROR: Arguments are incompatible
[0]PETSC ERROR: Number of points in permutation 9622 does not match chart size 9623
--- snip --
[0]PETSC ERROR: Petsc Development GIT revision: v3.20.2-242-gc574b33dbb6  GIT Date: 2023-12-05 20:46:15 +0000
[0]PETSC ERROR: /Users/baagaard/software/unix/py3.12-venv/pylith-debug/bin/mpinemesis on a arch-clang-15.0_debug named IGSKCI164LM006 by baagaard Thu Mar 21 13:58:21 2024
-- snip --
[0]PETSC ERROR: #1 DMCreateSectionPermutation_Plex_Cohesive() at /Users/baagaard/software/unix/petsc-dev/src/dm/impls/plex/plexreorder.c:504
[0]PETSC ERROR: #2 DMCreateSectionPermutation_Plex() at /Users/baagaard/software/unix/petsc-dev/src/dm/impls/plex/plexreorder.c:526
[0]PETSC ERROR: #3 DMCreateSectionPermutation() at /Users/baagaard/software/unix/petsc-dev/src/dm/interface/dm.c:4392
[0]PETSC ERROR: #4 DMCreateLocalSection_Plex() at /Users/baagaard/software/unix/petsc-dev/src/dm/impls/plex/plexsection.c:611
[0]PETSC ERROR: #5 DMGetLocalSection() at /Users/baagaard/software/unix/petsc-dev/src/dm/interface/dm.c:4292
[0]PETSC ERROR: #6 DMGetSection() at /Users/baagaard/software/unix/petsc-dev/src/dm/interface/dm.c:4243
[0]PETSC ERROR: #7 void pylith::topology::Field::allocate()() at /Users/baagaard/src/cig/pylith/libsrc/pylith/topology/Field.cc:303

How to reproduce

PETSc branch: knepley/feature-partitioner-edge-weights PyLIth fork: https://github.com/baagaard-usgs/pylith.git PyLith branch: improve-elasticity-fault-pc

cd examples/reverse-2d
pylith step06_twofaults_elastic.cfg

baagaard-usgs avatar Mar 21 '24 20:03 baagaard-usgs

@baagaard-usgs I believe this is fixed in https://gitlab.com/petsc/petsc/-/merge_requests/7417

knepley avatar Mar 27 '24 16:03 knepley

Fix verified.

baagaard-usgs avatar Mar 27 '24 18:03 baagaard-usgs