quda Refactor/expansion of MG setup methods

This PR refactors and expands the number of methods by which near-null vectors can be generated in QUDA. Due to the nature of the refactor and cleanup, this PR is interface breaking, but in principle in a future-proof way -- near-null vector generation methods are now specified via an enum, QudaNullVectorSetupType, so it is straightforward to add more options in a non-breaking fashion. This PR also codifies the existing behavior that, if an input near-null vector is specified (via --mg-load-vec from the command line, for ex), it is loaded and all other options are ignored.

The full list of methods now includes:

Inverse iterations (still the default)
Chebyshev filter a la arXiv:2103.05034, P. Boyle and A. Yamaguchi
Eigenvectors (as before)
Free field vectors (as before, and only for sanity tests)
Restricting as many near null vectors as possible from the finer level, then generating more from another method (that is --- if level 1 is Nc = 24, and level 2 is Nc = 32, then for level 2 all 24 near-null vectors from level 1 are restricted then 8 more are generated)

This PR also supports "polishing" near-null vectors generated by other methods with more iterations of inverse iterations.

The incomplete test vector support in QUDA has been mostly removed as it requires a fuller refactor that is outside of the scope of this PR, though it is on the to-do list in the future as it is a demonstrably successful approach.

Command line arguments

The core command line argument, which has been repurposed, is

--mg-setup-type [level] [method], where [method] can be inverse-iterations (default), chebyshev-filter, eigenvectors, test-vectors, restrict-fine, and free-field (where test-vectors gracefully errorQudas out)

Inverse iterations

No options for inverse iterations have been changed; --mg-setup-tol, --mg-setup-maxiter, etc, all behave as expected.

Eigenvectors

No options for eigenvectors have been changed.

Chebyshev Filter

The Chebyshev filter has a flexible set of parameters describing generating a set of near-null vectors related to the initial low-pass filter, the number of starting vectors, and subsequent generation from a low-passed starting vector.

--mg-setup-filter-startup-vectors - the number of random starting vectors, default 1. As some examples, if the number of near-null vectors for level 1 is 24, --mg-setup-filter-startup-vectors 1 1 corresponds to one starting vector with 24 near-nulls generated; [...] 1 3 corresponds to three starting vectors with 3 near-nulls generated from each, etc. In cases like [...] 1 5, where 5 doesn't divide into 24, 5 near-null vectors are generated from the first four starting vectors, and 4 from the last --- 4 * 4 + 4 = 24.
--mg-setup-filter-startup-iterations - number of iterations for the initial low-pass filter, default 1000.
--mg-setup-filter-startup-rescale-frequency - an empirical feature; since the norm of a vector could overflow (or individual values thereof), the vector can be renormalized with some frequency in a way that preserves the Chebyshev recursion; empirical default 50.
--mg-setup-filter-lambda-max - upper bound for the Chebyshev filters, default power iterations to guess an upper bound
--mg-setup-filter-lambda-min - lower bound to use for the initial low pass filter, modes smaller than that value are enhanced. Default 1.
--mg-setup-filter-iterations-between-vectors - number of iterations between subsequent near-null vectors after the initial low-pass filter. As an example, if startup iterations is 1000, and the number of iterations between vectors is 150, a near-null vector is generated after 1000 (initial) matrix applications, then at 1150, 1300, 1450... . Default 150.

Restriction

There are no special flags for restriction in and of itself, but:

--mg-setup-restrict-remaining-type [level] [method] can be used to specify which method to use to generate the "remaining" near null vectors if fine nvec != coarse nvec. Parameters for the remainder method are taken from the flags for each method.

"Polishing" near-null vectors

"Polishing" near-null vectors with inverse iterations is enabled by specifying a non-zero number of polish iterations via

--mg-setup-maxiter-inverse-iterations-polish [level] [numbers] , where the default [numbers] is 0, corresponding to no polishing. Parameters for polishing are taken from the flags for inverse iteration setup.

Reference commands

A base command where we use inverse iterations for level 1 and a custom method for level 2 (specified by SETUP_FLAGS_LEVEL2) is:

CONFIG=[...]
SETUP_FLAGS_LEVEL2=[...]
./invert_test \
  --prec double --prec-sloppy single --prec-null half --prec-precondition half \
  --recon 12 --recon-sloppy 8 --recon-precondition 8 \
  --dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge $CONFIG \
  --dslash-type wilson --kappa 0.1394265 --tol 1e-10 --nsrc 1 --niter 20 \
  --verbosity verbose --solve-type direct-pc --solution-type mat --inv-type gcr \
  --inv-multigrid true --mg-levels 3 \
  --mg-block-size 0 4 4 4 4 --mg-nvec 0 24 \
  --mg-block-size 1 2 2 2 2 --mg-nvec 1 32 \
  --mg-setup-tol 0 5e-7 --mg-setup-inv 0 cgnr $SETUP_FLAGS_LEVEL2 \
  --mg-use-mma true \
  --mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 direct-pc --mg-nu-pre 0 0 --mg-nu-post 0 4 \
  --mg-smoother 1 ca-gcr --mg-smoother-solve-type 1 direct-pc --mg-nu-pre 1 0 --mg-nu-post 1 4 \
  --mg-coarse-solver 1 gcr --mg-coarse-solve-type 1 direct-pc --mg-coarse-solver-tol 1 0.35 --mg-coarse-solver-maxiter 1 10 \
  --mg-coarse-solver 2 gcr --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.01 --mg-coarse-solver-maxiter 2 20 \
  --mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose

Where we will fill in SETUP_FLAGS_LEVEL2 for different options.

Inverse Iterations

A standard setup is

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 inverse-iterations --mg-setup-tol 1 5e-7 --mg-setup-inv 1 cgnr"

Where the --mg-setup-type flag is optional as inverse iterations are the default

Eigenvectors

A reference setup without polynomial acceleration is

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 eigenvectors --mg-eig-type 1 trlm --mg-eig-use-dagger 1 false --mg-eig-use-normop 1 true --mg-eig-use-poly-acc 1 false --mg-eig-n-ev 1 36 --mg-eig-n-kr 1 64 --mg-eig-tol 1 1e-5"

Chebyshev filter

A reference setup where 4 base vectors are used -> 8 near-null vectors are generated from each base vector, the minimum of the low pass filter is 1.0, a 500 iteration low pass filter with rescaling every 50 iterations is used, and there are 100 iterations between subsequent near-nulls is:

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 chebyshev-filter --mg-setup-filter-startup-vectors 1 4 --mg-setup-filter-startup-iterations 1 500 --mg-setup-filter-startup-rescale-frequency 1 50 --mg-setup-filter-lambda-min 1 1.0 --mg-setup-filter-iterations-between-vectors 1 50"

Restriction

A reference setup with restriction, then using inverse iterations for the remaining 8 vectors (32 on level 2 minus 24 on level 1) is:

SETUP_FLAGS_LEVEL2="--mg-setup-type 1 restrict-fine --mg-setup-restrict-remaining-type 1 inverse-iterations --mg-setup-tol 1 5e-7 --mg-setup-inv 1 cgnr"

--mg-setup-restrict-remaining-type can be changed appropriately, grabbing other reference flags as appropriate.

Polishing with inverse iterations

As an example, the parameters for a Chebyshev filter can be included, and then they can be polished for 50 iterations via adding:

SETUP_FLAGS_LEVEL2="${SETUP_FLAGS_LEVEL2} --mg-setup-maxiter-inverse-iterations-refinement 1 50 --mg-setup-tol 1 5e-7 --mg-setup-inv 1 cgnr"

Outstanding work

[ ] Exposing these options in the MILC interface
[x] Copy plus pasting all of this to a wiki page: https://github.com/lattice/quda/wiki/Multigrid-Solver
[ ] Interface testing/updating with Chroma, ...?
[ ] clang-format

May 27 '22 17:05 weinbe2

Completed an initial visual review of this PR. Looks like a good contribution. Left a few comments, and I'll test it on some clover multigrid shortly.

Jun 02 '22 00:06 maddyscientist

@weinbe2 sorry that I haven't gotten back to this yet. I started taking a look to see what I need to adjust in tmLQCD to support the changes here when #1287 got in the way... I'll try to spin this up in the next few days and provide feedback, if any.

Jun 15 '22 13:06 kostrzewa

@weinbe2 sorry that I haven't gotten back to this yet. I started taking a look to see what I need to adjust in tmLQCD to support the changes here when #1287 got in the way... I'll try to spin this up in the next few days and provide feedback, if any.

No problem! There are certainly plenty of higher-priority things bouncing around, on all of our plates. Whenever you can get to it is fine and appreciated.

Jun 15 '22 20:06 weinbe2

FYI, one of the recent merges of develop slightly broke this branch, will fix it this week.

Jul 06 '22 19:07 weinbe2

@weinbe2 Seems to work fine for me with with the changes to our interface that I implemented in https://github.com/etmc/tmLQCD/pull/548 to preserve the status quo. I will keep track of this PR and make any adjustments that may become necessary due to ongoing changes.

Sep 21 '22 15:09 kostrzewa

For future reference, do not merge, it appears an issue creeped in not in develop. For my own convenience, a command that caused an issue was

LOCAL_DIM="32 32 32 16"
GRIDSIZE="1 1 1 2"
OUTER_OPERATOR=direct-pc
GAUGE_PATH=./l32t32b7p0
MASS=0.00945
TADPOLE_COEFF=1.0
TOL=1e-5
FIRST_AGGREGATE="4 4 4 4"
SECOND_AGGREGATE="4 4 4 4"
COARSEST_NVEC=128
COARSEST_NEV=160
COARSEST_NKR=192
COARSEST_POLY_DEG=32
COARSEST_AMIN=0.85
COARSEST_AMAX=0.0

export QUDA_RESOURCE_PATH=.
export QUDA_TUNE_VERSION_CHECK=0

mpirun -np 2 --allow-run-as-root \
                        ./staggered_invert_test --inv-multigrid true --dim ${LOCAL_DIM} --gridsize ${GRIDSIZE} --verbosity verbose --nsrc 1 \
                        --dslash-type asqtad --solve-type ${OUTER_OPERATOR} --solution-type mat --compute-fat-long true \
                        --load-gauge ${GAUGE_PATH} --mass ${MASS} --tadpole-coeff ${TADPOLE_COEFF} --tol ${TOL} \
                        --prec single --prec-sloppy single --prec-precondition single --prec-null single --mg-use-mma false \
                        --recon 18 --recon-sloppy 18 --recon-precondition 18 \
                        --mg-verbosity 2 verbose --mg-verbosity 3 verbose \
                        --mg-save-vec 1 blah --mg-save-vec 2 blah \
                        --mg-levels 4 --mg-coarse-solve-type 0 ${OUTER_OPERATOR} --mg-staggered-coarsen-type kd-optimized \
                        --mg-block-size 0 1 1 1 1 --mg-nvec 0 3 \
                        --mg-smoother-solve-type 0 ${OUTER_OPERATOR} --mg-smoother 0 ca-gcr --mg-nu-pre 0 0 --mg-nu-post 0 8 --mg-smoother-tol 0 1e-10  \
                        --mg-coarse-solve-type 1 direct --mg-coarse-solver-tol 1 5e-2 \
                        --mg-coarse-solver-maxiter 1 16 --mg-coarse-solver 1 gcr \
                        --mg-setup-inv 1 cgnr --mg-setup-maxiter 1 1000 --mg-setup-tol 1 1e-5 \
                        --mg-block-size 1 ${FIRST_AGGREGATE} --mg-nvec 1 64 --mg-n-block-ortho 1 2 \
                        --mg-smoother-solve-type 1 direct --mg-smoother 1 ca-gcr --mg-nu-pre 1 0 --mg-nu-post 1 2 --mg-smoother-tol 1 1e-10 \
                        --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.25 \
                        --mg-coarse-solver-maxiter 2 16   --mg-coarse-solver 2 gcr \
                        --mg-setup-inv 2 cgnr --mg-setup-maxiter 2 1000 --mg-setup-tol 2 1e-5 \
                        --mg-block-size 2 ${SECOND_AGGREGATE} --mg-nvec 2 96 --mg-n-block-ortho 2 2 \
                        --mg-smoother-solve-type 2 direct-pc --mg-smoother 2 ca-gcr --mg-nu-pre 2 0 --mg-nu-post 2 2 --mg-smoother-tol 2 1e-10 \
                        --mg-coarse-solver 3 ca-gcr --mg-coarse-solver-ca-basis-size 3 16 --mg-coarse-solver-maxiter 3 16 \
                        --mg-nvec 3 ${COARSEST_NVEC} --mg-eig-n-ev 3 ${COARSEST_NEV} --mg-eig-n-kr 3 ${COARSEST_NKR} --mg-eig-tol 3 1e-4 \
                        --mg-eig 3 true --mg-eig-type 3 trlm --mg-eig-use-dagger 3 false --mg-eig-use-normop 3 true \
                        --mg-eig-use-poly-acc 3 true --mg-eig-poly-deg 3 ${COARSEST_POLY_DEG} --mg-eig-amin 3 ${COARSEST_AMIN} --mg-eig-amax 3 ${COARSEST_AMAX} \
                        --mg-eig-max-restarts 3 1000

and the error that popped up is

MG level 2 (GPU): Xinv = 1.574662e+04
MG level 2 (GPU): Yhat[0] = 6.719429e+01 (1.147926e-01 < 4.053768e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[1] = 6.493005e+01 (1.500221e-01 < 5.319621e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[2] = 6.315730e+01 (1.395274e-01 < 5.026488e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[3] = 2.209952e+02 (3.106113e-01 < 8.176686e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[4] = 4.554047e+01 (1.680291e-01 < 6.054319e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[5] = 4.207957e+01 (1.500221e-01 < 5.088072e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[6] = 4.217662e+01 (1.335199e-01 < 4.538731e-02 x 3.218868e+00)
MG level 2 (GPU): Yhat[7] = 1.093643e+02 (3.106113e-01 < 7.678801e-02 x 3.218868e+00)
MG level 2 (GPU): ....done computing Yhat field
MG level 2 (GPU): Finished creating the preconditioned coarse op
MG level 2 (GPU): Coarse Dirac operator done
MG level 2 (GPU): Creating smoother
MG level 2 (GPU): Creating a CA-GCR solver
MG level 2 (GPU): Smoother done
MG level 2 (GPU): Checking 0 = (1 - P P^\dagger) v_k for 96 vectors
MG level 2 (GPU): Vector 0: norms v_k = 1.000000e+00 P^\dagger v_k = 6.917310e-01 (1 - P P^\dagger) v_k = 1.202923e+00, L2 relative deviation = 1.096779e+00
MG level 2 (GPU): ERROR: L2 relative deviation for k=0 failed, 1.096779e+00 > 1.000000e-03 (rank 0, host luna-0196, multigrid.cpp:920 in void quda::MG::verify(bool)())
MG level 2 (GPU): ERROR: L2 relative deviation for k=0 failed, 1.096779e+00 > 1.000000e-03 (rank 1, host luna-0196, multigrid.cpp:920 in void quda::MG::verify(bool)())
MG level 2 (GPU):        last kernel called was (name=N4quda4blas5Norm2IdfEE,volume=8x8x8x4x1,aux=GPU-offline,nParity=2,vol=2048,parity=2,precision=4,order=2,Ns=2,Nc=64)
MG level 2 (GPU):        last kernel called was (name=N4quda4blas5Norm2IdfEE,volume=8x8x8x4x1,aux=GPU-offline,nParity=2,vol=2048,parity=2,precision=4,order=2,Ns=2,Nc=64)
QMP m0,n2@luna-0196 error: abort: 1
QMP m1,n2@luna-0196 error: abort: 1

sm_80, fast build, QUDA_PRECISION=12

Oct 25 '22 17:10 weinbe2

quda quda copied to clipboard

Refactor/expansion of MG setup methods

Command line arguments

Reference commands

Outstanding work

quda
quda copied to clipboard