quda Gauge observables: trace of arbitrary gauge loops (useful for gauge action), Polyakov loop

This PR adds two new types of gauge measurements to QUDA: batch computation of the summed trace of arbitrary gauge loops (relevant for gauge action calculations), and the calculation of the (temporal) Polyakov loop. These new capabilities are exposed through both the standard QUDA interface and through the MILC interface, utilized in a complimentary PR to MILC.

Gauge loops

Standalone code to compute products of gauge links along arbitrary paths has been moved from the gauge force routines to a helper file include/gauge_path_helper.cuh. This code is then used in the new gauge loop code and in a refactor of the gauge force routines.

A similar refactor was done in the host verification code, and verification of the gauge loop calculation has been added to the gauge force tests (since it's a very similar workflow).

Polyakov loops

The Polyakov loop code is fully new, and is implemented via a routine that parallelizes over discrete (x, y, z) coordinates, where each work item computes the product of temporal gauge links across the entire local t dimension. This code is utilized recursively when the t dimension is split across MPI ranks, where:

A "face" of gauge link products of local size x x y x z is computed
The face is saved as a gauge field of size x x y x z x 1 (where the odd t dimension "works" since it is the slowest dimension)
This data is exchanged across temporal ranks at fixed X, Y, and Z rank, inserted as "fictitious" timeslices in a new gauge field of size x x y x z x N_t
The gauge link product code is then re-used to compute the product in the t dimension again, with a final trace + reduce

To make this code relatively robust against different partitionings of the T dimension, all intermediate calculations are done in full double precision, independent of the precision of the input gauge field.

For now, this is only implemented for Polyakov loops in the t dimension, but the plumbing is in place to generalize this---it just doesn't have a compelling use case at the moment.

This code has been verified against MILC's calculation of the Polyakov loop. The temporal Polyakov loop also gets calculated in QUDA's su3_test test executable. While it would be a best practice to also implement calculating the Polyakov loop in QUDA's own host verification code, implementing the full machinery to handle a distributed calculation of the Polyakov loop on the CPU felt a bit too much like reinventing the wheel...

Miscellaneous interface notes

Routines have been added to the QUDA interface to calculate gauge loop traces as well as the Polyakov loop. In addition, this PR adds routines that calculate the plaquette and the Polyakov loop from an input CPU field as opposed to using (and thus requiring) a resident field.

MILC interface notes

As it stands, the MILC interface does not take advantage of keeping the gauge fields resident. This is a somewhat conscious decision because the gauge links may or may not have phases applied at different points in the RHMC workflow, leading to a book-keeping headache, and in any case the time saved by offloading gauge loop/etc calculations to the GPU more than amortizes the cost of redundant host -> device copies.

Performance results

This optimization corresponds to a ~5% performance boost in the standard NERSC medium RHMC workflow.

Outstanding work

[x] Opening an analogous MILC PR: https://github.com/lattice/milc_qcd/pull/24 (updated)
[x] ~Implementing a calculation of the Polyakov loop in spatial directions~ Punted, see #1311
[x] Cleaning up comments, doxygen
[ ] clang-format

Aug 03 '22 18:08 weinbe2

Complimentary reference PR for MILC: https://github.com/milc-qcd/milc_qcd/pull/55

Aug 03 '22 18:08 weinbe2

With the exception of clang-format, which I can do as a last step, this is ready for review.

Aug 04 '22 21:08 weinbe2

Completed an initial visual review of this. This looks like great work Evan and I'm looking forward to testing it. I've left a variety of comments in the meantime 😉

Aug 16 '22 05:08 maddyscientist

Thanks for the last few fixes. This looks good to go.

Excellent --- I'm doing a few last checks (just making sure clang-format didn't accidentally do something stupid), and when Jenkins completes we can get this merged, after which I'll do some last tests to the MILC PR.

Sep 06 '22 19:09 weinbe2

@mathiaswagner anything to offer on this PR, or are we good to merge?

Sep 07 '22 05:09 maddyscientist

I'll have another look today and guess we can then merge later.

Sep 07 '22 05:09 mathiaswagner

Looks good to me but I still left a few minor comments. Thanks for adding that @weinbe2 !

Sep 07 '22 13:09 mathiaswagner

All comments addressed @mathiaswagner , let me know if you have any further thoughts, otherwise I can get this merged once Jenkins finishes.

Sep 07 '22 16:09 weinbe2

I think with all comments addressed just feel free to use my pre-approval to merge once Jenkins is happy.

Sep 07 '22 16:09 mathiaswagner

quda quda copied to clipboard

Gauge observables: trace of arbitrary gauge loops (useful for gauge action), Polyakov loop

Gauge loops

Polyakov loops

Miscellaneous interface notes

MILC interface notes

Performance results

Outstanding work

quda
quda copied to clipboard