BOUT-dev icon indicating copy to clipboard operation
BOUT-dev copied to clipboard

Bug in LaplacePCR

Open ZedThree opened this issue 3 years ago • 9 comments

In #2336 I've added an alternative to #2333: https://github.com/boutproject/BOUT-dev/blob/93454ecaded94667725d48c3cb9b991c23f239bc/src/invert/laplace/invert_laplace.cxx#L106-L126

Basically check that we can use PCR as the default Laplacian, otherwise fall back to cyclic. This essentially makes PCR the new default in most cases.

Unfortunately (or possibly fortunately?) this throws up a few issues with PCR in the following tests:

  • test-gyro, which is a regression test, comparing the inversion to a benchmark result:
Running Gyro-average inversion test
   1 processors (nxpe = 1)....
      Checking variable pade1 ... Fail, maximum difference = 0.021458830684423447
      Checking variable pade2 ... Fail, maximum difference = 0.0043205489637330174
   2 processors (nxpe = 1)....
      Checking variable pade1 ... Fail, maximum difference = 0.021458830684423447
      Checking variable pade2 ... Fail, maximum difference = 0.0043205489637330174
   4 processors (nxpe = 2)....
      Checking variable pade1 ... Fail, maximum difference = 0.021458830684423447
      Checking variable pade2 ... Fail, maximum difference = 0.0043205489637330174
  • test-petsc-laplace-MAST-grid, which applies the forward operator and inverts the result, checking round-trip accuracy, tests 4 and 8 use the default solver (i.e. PCR), all others use PETSc (errors are the same across processor count and grid):
      Checking max_error1 ... Pass
      Checking max_error2 ... Pass
      Checking max_error3 ... Pass
      Checking max_error4 ... Fail, maximum error is = 0.9939465685503767
      Checking max_error5 ... Pass
      Checking max_error6 ... Pass
      Checking max_error7 ... Pass
      Checking max_error8 ... Fail, maximum error is = 1.0057517623023355
  • test-petsc-laplace, which is similar to the previous test, but the PETSc solvers use the default Laplacian as a preconditioner:
KSPConvergedReason is -5
BoutException occured in invert->solve(b1): petsc_laplace: inversion failed to converge.

Test 1: PETSc 2nd order
Magnitude of maximum absolute error is -1
KSPConvergedReason is -5
BoutException occured in invert->solve(b1): petsc_laplace: inversion failed to converge.

Test 2: PETSc 4th order
Magnitude of maximum absolute error is -1
KSPConvergedReason is -5
BoutException occured in invert->solve(b3): petsc_laplace: inversion failed to converge.

Test 3: with coefficients constant in z, PETSc 2nd order
Magnitude of maximum absolute error is -1

Test 4: with coefficients constant in z, default solver
Magnitude of maximum absolute error is 0.999817
KSPConvergedReason is -5
BoutException occured in invert->solve(b5): petsc_laplace: inversion failed to converge.

Test 5: different profiles, PETSc 2nd order
Magnitude of maximum absolute error is -1
KSPConvergedReason is -5
BoutException occured in invert->solve(b6): Laplacian inversion failed to converge (probably)

Test 6: different profiles, PETSc 4th order
Magnitude of maximum absolute error is -1
KSPConvergedReason is -5
BoutException occured in invert->solve(b7): petsc_laplace: inversion failed to converge.

Test 7: different profiles, with coefficients constant in z, PETSc 2nd order
Magnitude of maximum absolute error is -1

Test 8: different profiles, with coefficients constant in z, default solver
Magnitude of maximum absolute error is 1.00577

Given that test-laplace is checked with both cyclic and pcr there must be something else going on. The common factor seems to be the outer boundary flags:

  • test-laplace checks only INVERT_SET in the outer boundary
  • test-gyro uses INVERT_BNDRY_ONE + INVERT_RHS
  • test-petsc_laplace uses INVERT_AC_GRAD and INVERT_AC_GRAD | INVERT_DC_GRAD
  • test-petsc_laplace_MAST-grid uses INVERT_AC_GRAD

My suspicions are that LaplacePCR ::eliminate_boundary_rows only accounts for one boundary, when there could be up to MXG boundary rows that need eliminating, depending on the inversion flags.

ZedThree avatar Jun 23 '21 09:06 ZedThree

Thanks Peter! The purpose of eliminate boundary rows is just to break the linear dependence between the boundary rows and the interior rows, so it should only require the one boundary row. Though it does assume that the structure is

                            boundary. | interior
boundary ->               0  x  x  0  |. 0. 0. 0  
boundary ->               0. 0. x. x. |  0  0. 0
boundary ->.              0. 0. 0  x. |  x. 0. 0
interior ->               0. 0. 0. x. |. x. x. 0
interior ->.              0. 0. 0. 0. |  x. x. x

becomes

                            boundary. | interior
boundary ->               0  x  x  0  |. 0. 0. 0  
boundary ->               0. 0. x. x. |  0  0. 0
boundary ->.              0. 0. 0  x. |  x. 0. 0
interior ->               0. 0. 0. 0. |. Y. x. 0
interior ->.              0. 0. 0. 0. |  x. x. x

Are there cases where this won't be the structure?

There might also be places where this assumes there are two guard cells, so I'll check that too.

JosephThomasParker avatar Jun 23 '21 09:06 JosephThomasParker

Ah I see! I was thinking that might be suspicious just because I was tracing through where the a/b/c from Laplacian::tridagMatrix are used. That deals with INVERT_AC_GRAD for both PCR and cyclic, so presumably the issue is how they are used.

So if it's not eliminate_boundary_rows, it must be somewhere else in cr_pcr_solver?

Is the issue that aa, bb, cc only contain one boundary/guard cell? For cyclic, a, b, c don't use any guard cells, but do include all the boundaries.

ZedThree avatar Jun 23 '21 10:06 ZedThree

Might it be related to the failure in #2207 ?

That is also failing if rebased to next ...

dschwoerer avatar Jun 23 '21 11:06 dschwoerer

Just to quickly note a few things: There is a problem in eliminate_boundary_rows: in the test-gyro case the wrong rows are eliminated (the second and penultimate internal rows, instead of the first and final internal rows). This is confusing as the indexing is determined by localmesh->xstart. Does the meaning of xstart change depending on the boundary conditions?

But in any case, fixing the indexing still gets the wrong results, with the error in \|Ax - b\| only really being significant in the few rows near the boundaries.

Still working on this...

JosephThomasParker avatar Jul 07 '21 16:07 JosephThomasParker

Ok, found the problem: the code requires the number of internal rows to be a power of 2, and checks this by doing:

  // Number of x points must be a power of 2
  if (!is_pow2(localmesh->GlobalNxNoBoundaries)) {
    throw BoutException("LaplacePCR error: GlobalNx must be a power of 2");
  }

where GlobalNxNoBoundaries = nx - 2*MXG; is a power of two in both test-laplace and test-gyro. However, in test-gyro, the flag INVERT_BNDRY_ONE is set, which means only one boundary cell is used, even though MXG=2. So in test-gyro we're actually doing the inversion on 68 - 2*1 = 66 points, not 68 - 2*2 = 64 points, so that's why the solver fails.

I'll think about whether it's possible to tweak the row elimination to get this to work. The "real" problem though is the inconsistency over how many guard cells we are using.

JosephThomasParker avatar Jul 08 '21 10:07 JosephThomasParker

Is this a "fundamental" problem that PCF is incompatible with INVERT_BNDRY_ONE? Or it would work, if the number of interior points were different such that we'd still use a power of two?

If the latter, then we could adjust the precondition to:

const int global_nx = localmesh->GlobalNxNoBoundaries;
const int global_nx_maybe_with_boundary = using_invert_bndry_one ? global_nx + 1 : global_nx;

if (not is_pow2(global_nx_maybe_with_boundary)) {
  ...

(terrible names not withstanding). Then in #2336 we'd neatly fall back to cyclic if that weren't met.

If we can handle this in the row elimination and don't need to adjust the precondition, that's probably the better solution though?

ZedThree avatar Jul 08 '21 10:07 ZedThree

I'd be tempted to suggest just deprecating/removing INVERT_BNDRY_ONE. I can't see when it would ever be useful, and I'm not sure why it's there at all - maybe @bendudson knows?

johnomotani avatar Jul 08 '21 10:07 johnomotani

Don't know why INVERT_BNDRY_ONE exists really. I suspect it was an attempt to deal with cases where the boundary to be applied isn't clear, such as with gyro-averaging: What boundary condition should be put on gyro-averaged density, to be consistent with density? By moving the boundary away from the domain, it may give that bit of additional freedom at the actual boundary. I'm not sure if it actually makes any difference though.

bendudson avatar Jul 08 '21 11:07 bendudson

The plan is now to make solve throw if INVERT_AC_GRAD is set. This means we won't use PCR as the default solver.

ZedThree avatar Sep 14 '21 16:09 ZedThree