quda
quda copied to clipboard
Gauge observables: trace of arbitrary gauge loops (useful for gauge action), Polyakov loop
This PR adds two new types of gauge measurements to QUDA: batch computation of the summed trace of arbitrary gauge loops (relevant for gauge action calculations), and the calculation of the (temporal) Polyakov loop. These new capabilities are exposed through both the standard QUDA interface and through the MILC interface, utilized in a complimentary PR to MILC.
Gauge loops
Standalone code to compute products of gauge links along arbitrary paths has been moved from the gauge force routines to a helper file include/gauge_path_helper.cuh. This code is then used in the new gauge loop code and in a refactor of the gauge force routines.
A similar refactor was done in the host verification code, and verification of the gauge loop calculation has been added to the gauge force tests (since it's a very similar workflow).
Polyakov loops
The Polyakov loop code is fully new, and is implemented via a routine that parallelizes over discrete (x, y, z) coordinates, where each work item computes the product of temporal gauge links across the entire local t dimension. This code is utilized recursively when the t dimension is split across MPI ranks, where:
- A "face" of gauge link products of local size
xxyxzis computed - The face is saved as a gauge field of size
xxyxzx1(where the oddtdimension "works" since it is the slowest dimension) - This data is exchanged across temporal ranks at fixed
X,Y, andZrank, inserted as "fictitious" timeslices in a new gauge field of sizexxyxzxN_t - The gauge link product code is then re-used to compute the product in the
tdimension again, with a final trace + reduce
To make this code relatively robust against different partitionings of the T dimension, all intermediate calculations are done in full double precision, independent of the precision of the input gauge field.
For now, this is only implemented for Polyakov loops in the t dimension, but the plumbing is in place to generalize this---it just doesn't have a compelling use case at the moment.
This code has been verified against MILC's calculation of the Polyakov loop. The temporal Polyakov loop also gets calculated in QUDA's su3_test test executable. While it would be a best practice to also implement calculating the Polyakov loop in QUDA's own host verification code, implementing the full machinery to handle a distributed calculation of the Polyakov loop on the CPU felt a bit too much like reinventing the wheel...
Miscellaneous interface notes
Routines have been added to the QUDA interface to calculate gauge loop traces as well as the Polyakov loop. In addition, this PR adds routines that calculate the plaquette and the Polyakov loop from an input CPU field as opposed to using (and thus requiring) a resident field.
MILC interface notes
As it stands, the MILC interface does not take advantage of keeping the gauge fields resident. This is a somewhat conscious decision because the gauge links may or may not have phases applied at different points in the RHMC workflow, leading to a book-keeping headache, and in any case the time saved by offloading gauge loop/etc calculations to the GPU more than amortizes the cost of redundant host -> device copies.
Performance results
This optimization corresponds to a ~5% performance boost in the standard NERSC medium RHMC workflow.
Outstanding work
- [x] Opening an analogous MILC PR: https://github.com/lattice/milc_qcd/pull/24 (updated)
- [x] ~Implementing a calculation of the Polyakov loop in spatial directions~ Punted, see #1311
- [x] Cleaning up comments, doxygen
- [ ]
clang-format
Complimentary reference PR for MILC: https://github.com/milc-qcd/milc_qcd/pull/55
With the exception of clang-format, which I can do as a last step, this is ready for review.
Completed an initial visual review of this. This looks like great work Evan and I'm looking forward to testing it. I've left a variety of comments in the meantime 😉
Thanks for the last few fixes. This looks good to go.
Excellent --- I'm doing a few last checks (just making sure clang-format didn't accidentally do something stupid), and when Jenkins completes we can get this merged, after which I'll do some last tests to the MILC PR.
@mathiaswagner anything to offer on this PR, or are we good to merge?
I'll have another look today and guess we can then merge later.
Looks good to me but I still left a few minor comments. Thanks for adding that @weinbe2 !
All comments addressed @mathiaswagner , let me know if you have any further thoughts, otherwise I can get this merged once Jenkins finishes.
I think with all comments addressed just feel free to use my pre-approval to merge once Jenkins is happy.