quda icon indicating copy to clipboard operation
quda copied to clipboard

Refactor/expansion of MG setup methods

Open weinbe2 opened this issue 3 years ago • 6 comments

This PR refactors and expands the number of methods by which near-null vectors can be generated in QUDA. Due to the nature of the refactor and cleanup, this PR is interface breaking, but in principle in a future-proof way -- near-null vector generation methods are now specified via an enum, QudaNullVectorSetupType, so it is straightforward to add more options in a non-breaking fashion. This PR also codifies the existing behavior that, if an input near-null vector is specified (via --mg-load-vec from the command line, for ex), it is loaded and all other options are ignored.

The full list of methods now includes:

  • Inverse iterations (still the default)
  • Chebyshev filter a la arXiv:2103.05034, P. Boyle and A. Yamaguchi
  • Eigenvectors (as before)
  • Free field vectors (as before, and only for sanity tests)
  • Restricting as many near null vectors as possible from the finer level, then generating more from another method (that is --- if level 1 is Nc = 24, and level 2 is Nc = 32, then for level 2 all 24 near-null vectors from level 1 are restricted then 8 more are generated)

This PR also supports "polishing" near-null vectors generated by other methods with more iterations of inverse iterations.

The incomplete test vector support in QUDA has been mostly removed as it requires a fuller refactor that is outside of the scope of this PR, though it is on the to-do list in the future as it is a demonstrably successful approach.

Command line arguments

The core command line argument, which has been repurposed, is

  • --mg-setup-type [level] [method], where [method] can be inverse-iterations (default), chebyshev-filter, eigenvectors, test-vectors, restrict-fine, and free-field (where test-vectors gracefully errorQudas out)

Inverse iterations

No options for inverse iterations have been changed; --mg-setup-tol, --mg-setup-maxiter, etc, all behave as expected.

Eigenvectors

No options for eigenvectors have been changed.

Chebyshev Filter

The Chebyshev filter has a flexible set of parameters describing generating a set of near-null vectors related to the initial low-pass filter, the number of starting vectors, and subsequent generation from a low-passed starting vector.

  • --mg-setup-filter-startup-vectors - the number of random starting vectors, default 1. As some examples, if the number of near-null vectors for level 1 is 24, --mg-setup-filter-startup-vectors 1 1 corresponds to one starting vector with 24 near-nulls generated; [...] 1 3 corresponds to three starting vectors with 3 near-nulls generated from each, etc. In cases like [...] 1 5, where 5 doesn't divide into 24, 5 near-null vectors are generated from the first four starting vectors, and 4 from the last --- 4 * 4 + 4 = 24.
  • --mg-setup-filter-startup-iterations - number of iterations for the initial low-pass filter, default 1000.
  • --mg-setup-filter-startup-rescale-frequency - an empirical feature; since the norm of a vector could overflow (or individual values thereof), the vector can be renormalized with some frequency in a way that preserves the Chebyshev recursion; empirical default 50.
  • --mg-setup-filter-lambda-max - upper bound for the Chebyshev filters, default power iterations to guess an upper bound
  • --mg-setup-filter-lambda-min - lower bound to use for the initial low pass filter, modes smaller than that value are enhanced. Default 1.
  • --mg-setup-filter-iterations-between-vectors - number of iterations between subsequent near-null vectors after the initial low-pass filter. As an example, if startup iterations is 1000, and the number of iterations between vectors is 150, a near-null vector is generated after 1000 (initial) matrix applications, then at 1150, 1300, 1450... . Default 150.

Restriction

There are no special flags for restriction in and of itself, but:

  • --mg-setup-restrict-remaining-type [level] [method] can be used to specify which method to use to generate the "remaining" near null vectors if fine nvec != coarse nvec. Parameters for the remainder method are taken from the flags for each method.

"Polishing" near-null vectors

"Polishing" near-null vectors with inverse iterations is enabled by specifying a non-zero number of polish iterations via

  • --mg-setup-maxiter-inverse-iterations-polish [level] [numbers] , where the default [numbers] is 0, corresponding to no polishing. Parameters for polishing are taken from the flags for inverse iteration setup.

Reference commands

A base command where we use inverse iterations for level 1 and a custom method for level 2 (specified by SETUP_FLAGS_LEVEL2) is:

CONFIG=[...]
SETUP_FLAGS_LEVEL2=[...]
./invert_test \
  --prec double --prec-sloppy single --prec-null half --prec-precondition half \
  --recon 12 --recon-sloppy 8 --recon-precondition 8 \
  --dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge $CONFIG \
  --dslash-type wilson --kappa 0.1394265 --tol 1e-10 --nsrc 1 --niter 20 \
  --verbosity verbose --solve-type direct-pc --solution-type mat --inv-type gcr \
  --inv-multigrid true --mg-levels 3 \
  --mg-block-size 0 4 4 4 4 --mg-nvec 0 24 \
  --mg-block-size 1 2 2 2 2 --mg-nvec 1 32 \
  --mg-setup-tol 0 5e-7 --mg-setup-inv 0 cgnr $SETUP_FLAGS_LEVEL2 \
  --mg-use-mma true \
  --mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 direct-pc --mg-nu-pre 0 0 --mg-nu-post 0 4 \
  --mg-smoother 1 ca-gcr --mg-smoother-solve-type 1 direct-pc --mg-nu-pre 1 0 --mg-nu-post 1 4 \
  --mg-coarse-solver 1 gcr --mg-coarse-solve-type 1 direct-pc --mg-coarse-solver-tol 1 0.35 --mg-coarse-solver-maxiter 1 10 \
  --mg-coarse-solver 2 gcr --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.01 --mg-coarse-solver-maxiter 2 20 \
  --mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose

Where we will fill in SETUP_FLAGS_LEVEL2 for different options.

Inverse Iterations

A standard setup is

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 inverse-iterations --mg-setup-tol 1 5e-7 --mg-setup-inv 1 cgnr"

Where the --mg-setup-type flag is optional as inverse iterations are the default

Eigenvectors

A reference setup without polynomial acceleration is

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 eigenvectors --mg-eig-type 1 trlm --mg-eig-use-dagger 1 false --mg-eig-use-normop 1 true --mg-eig-use-poly-acc 1 false --mg-eig-n-ev 1 36 --mg-eig-n-kr 1 64 --mg-eig-tol 1 1e-5"

Chebyshev filter

A reference setup where 4 base vectors are used -> 8 near-null vectors are generated from each base vector, the minimum of the low pass filter is 1.0, a 500 iteration low pass filter with rescaling every 50 iterations is used, and there are 100 iterations between subsequent near-nulls is:

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 chebyshev-filter --mg-setup-filter-startup-vectors 1 4 --mg-setup-filter-startup-iterations 1 500 --mg-setup-filter-startup-rescale-frequency 1 50 --mg-setup-filter-lambda-min 1 1.0 --mg-setup-filter-iterations-between-vectors 1 50"

Restriction

A reference setup with restriction, then using inverse iterations for the remaining 8 vectors (32 on level 2 minus 24 on level 1) is:

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 restrict-fine --mg-setup-restrict-remaining-type 1 inverse-iterations --mg-setup-tol 1 5e-7 --mg-setup-inv 1 cgnr"

--mg-setup-restrict-remaining-type can be changed appropriately, grabbing other reference flags as appropriate.

Polishing with inverse iterations

As an example, the parameters for a Chebyshev filter can be included, and then they can be polished for 50 iterations via adding:

SETUP_FLAGS_LEVEL2="${SETUP_FLAGS_LEVEL2} --mg-setup-maxiter-inverse-iterations-refinement 1 50 --mg-setup-tol 1 5e-7 --mg-setup-inv 1 cgnr"

Outstanding work

  • [ ] Exposing these options in the MILC interface
  • [x] Copy plus pasting all of this to a wiki page: https://github.com/lattice/quda/wiki/Multigrid-Solver
  • [ ] Interface testing/updating with Chroma, ...?
  • [ ] clang-format

weinbe2 avatar May 27 '22 17:05 weinbe2

Completed an initial visual review of this PR. Looks like a good contribution. Left a few comments, and I'll test it on some clover multigrid shortly.

maddyscientist avatar Jun 02 '22 00:06 maddyscientist

@weinbe2 sorry that I haven't gotten back to this yet. I started taking a look to see what I need to adjust in tmLQCD to support the changes here when #1287 got in the way... I'll try to spin this up in the next few days and provide feedback, if any.

kostrzewa avatar Jun 15 '22 13:06 kostrzewa

@weinbe2 sorry that I haven't gotten back to this yet. I started taking a look to see what I need to adjust in tmLQCD to support the changes here when #1287 got in the way... I'll try to spin this up in the next few days and provide feedback, if any.

No problem! There are certainly plenty of higher-priority things bouncing around, on all of our plates. Whenever you can get to it is fine and appreciated.

weinbe2 avatar Jun 15 '22 20:06 weinbe2

FYI, one of the recent merges of develop slightly broke this branch, will fix it this week.

weinbe2 avatar Jul 06 '22 19:07 weinbe2

@weinbe2 Seems to work fine for me with with the changes to our interface that I implemented in https://github.com/etmc/tmLQCD/pull/548 to preserve the status quo. I will keep track of this PR and make any adjustments that may become necessary due to ongoing changes.

kostrzewa avatar Sep 21 '22 15:09 kostrzewa

For future reference, do not merge, it appears an issue creeped in not in develop. For my own convenience, a command that caused an issue was

LOCAL_DIM="32 32 32 16"
GRIDSIZE="1 1 1 2"
OUTER_OPERATOR=direct-pc
GAUGE_PATH=./l32t32b7p0
MASS=0.00945
TADPOLE_COEFF=1.0
TOL=1e-5
FIRST_AGGREGATE="4 4 4 4"
SECOND_AGGREGATE="4 4 4 4"
COARSEST_NVEC=128
COARSEST_NEV=160
COARSEST_NKR=192
COARSEST_POLY_DEG=32
COARSEST_AMIN=0.85
COARSEST_AMAX=0.0

export QUDA_RESOURCE_PATH=.
export QUDA_TUNE_VERSION_CHECK=0

mpirun -np 2 --allow-run-as-root \
                        ./staggered_invert_test --inv-multigrid true --dim ${LOCAL_DIM} --gridsize ${GRIDSIZE} --verbosity verbose --nsrc 1 \
                        --dslash-type asqtad --solve-type ${OUTER_OPERATOR} --solution-type mat --compute-fat-long true \
                        --load-gauge ${GAUGE_PATH} --mass ${MASS} --tadpole-coeff ${TADPOLE_COEFF} --tol ${TOL} \
                        --prec single --prec-sloppy single --prec-precondition single --prec-null single --mg-use-mma false \
                        --recon 18 --recon-sloppy 18 --recon-precondition 18 \
                        --mg-verbosity 2 verbose --mg-verbosity 3 verbose \
                        --mg-save-vec 1 blah --mg-save-vec 2 blah \
                        --mg-levels 4 --mg-coarse-solve-type 0 ${OUTER_OPERATOR} --mg-staggered-coarsen-type kd-optimized \
                        --mg-block-size 0 1 1 1 1 --mg-nvec 0 3 \
                        --mg-smoother-solve-type 0 ${OUTER_OPERATOR} --mg-smoother 0 ca-gcr --mg-nu-pre 0 0 --mg-nu-post 0 8 --mg-smoother-tol 0 1e-10  \
                        --mg-coarse-solve-type 1 direct --mg-coarse-solver-tol 1 5e-2 \
                        --mg-coarse-solver-maxiter 1 16 --mg-coarse-solver 1 gcr \
                        --mg-setup-inv 1 cgnr --mg-setup-maxiter 1 1000 --mg-setup-tol 1 1e-5 \
                        --mg-block-size 1 ${FIRST_AGGREGATE} --mg-nvec 1 64 --mg-n-block-ortho 1 2 \
                        --mg-smoother-solve-type 1 direct --mg-smoother 1 ca-gcr --mg-nu-pre 1 0 --mg-nu-post 1 2 --mg-smoother-tol 1 1e-10 \
                        --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.25 \
                        --mg-coarse-solver-maxiter 2 16   --mg-coarse-solver 2 gcr \
                        --mg-setup-inv 2 cgnr --mg-setup-maxiter 2 1000 --mg-setup-tol 2 1e-5 \
                        --mg-block-size 2 ${SECOND_AGGREGATE} --mg-nvec 2 96 --mg-n-block-ortho 2 2 \
                        --mg-smoother-solve-type 2 direct-pc --mg-smoother 2 ca-gcr --mg-nu-pre 2 0 --mg-nu-post 2 2 --mg-smoother-tol 2 1e-10 \
                        --mg-coarse-solver 3 ca-gcr --mg-coarse-solver-ca-basis-size 3 16 --mg-coarse-solver-maxiter 3 16 \
                        --mg-nvec 3 ${COARSEST_NVEC} --mg-eig-n-ev 3 ${COARSEST_NEV} --mg-eig-n-kr 3 ${COARSEST_NKR} --mg-eig-tol 3 1e-4 \
                        --mg-eig 3 true --mg-eig-type 3 trlm --mg-eig-use-dagger 3 false --mg-eig-use-normop 3 true \
                        --mg-eig-use-poly-acc 3 true --mg-eig-poly-deg 3 ${COARSEST_POLY_DEG} --mg-eig-amin 3 ${COARSEST_AMIN} --mg-eig-amax 3 ${COARSEST_AMAX} \
                        --mg-eig-max-restarts 3 1000

and the error that popped up is

MG level 2 (GPU): Xinv = 1.574662e+04
MG level 2 (GPU): Yhat[0] = 6.719429e+01 (1.147926e-01 < 4.053768e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[1] = 6.493005e+01 (1.500221e-01 < 5.319621e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[2] = 6.315730e+01 (1.395274e-01 < 5.026488e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[3] = 2.209952e+02 (3.106113e-01 < 8.176686e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[4] = 4.554047e+01 (1.680291e-01 < 6.054319e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[5] = 4.207957e+01 (1.500221e-01 < 5.088072e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[6] = 4.217662e+01 (1.335199e-01 < 4.538731e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[7] = 1.093643e+02 (3.106113e-01 < 7.678801e-02 x 3.218868e+00)
MG level 2 (GPU): ....done computing Yhat field
MG level 2 (GPU): Finished creating the preconditioned coarse op
MG level 2 (GPU): Coarse Dirac operator done
MG level 2 (GPU): Creating smoother
MG level 2 (GPU): Creating a CA-GCR solver
MG level 2 (GPU): Smoother done
MG level 2 (GPU): Checking 0 = (1 - P P^\dagger) v_k for 96 vectors
MG level 2 (GPU): Vector 0: norms v_k = 1.000000e+00 P^\dagger v_k = 6.917310e-01 (1 - P P^\dagger) v_k = 1.202923e+00, L2 relative deviation = 1.096779e+00
MG level 2 (GPU): ERROR: L2 relative deviation for k=0 failed, 1.096779e+00 > 1.000000e-03 (rank 0, host luna-0196, multigrid.cpp:920 in void quda::MG::verify(bool)())
MG level 2 (GPU): ERROR: L2 relative deviation for k=0 failed, 1.096779e+00 > 1.000000e-03 (rank 1, host luna-0196, multigrid.cpp:920 in void quda::MG::verify(bool)())
MG level 2 (GPU):        last kernel called was (name=N4quda4blas5Norm2IdfEE,volume=8x8x8x4x1,aux=GPU-offline,nParity=2,vol=2048,parity=2,precision=4,order=2,Ns=2,Nc=64)
MG level 2 (GPU):        last kernel called was (name=N4quda4blas5Norm2IdfEE,volume=8x8x8x4x1,aux=GPU-offline,nParity=2,vol=2048,parity=2,precision=4,order=2,Ns=2,Nc=64)
QMP m0,n2@luna-0196 error: abort: 1
QMP m1,n2@luna-0196 error: abort: 1

sm_80, fast build, QUDA_PRECISION=12

weinbe2 avatar Oct 25 '22 17:10 weinbe2