Grid
Grid copied to clipboard
Very low acceptance for SU(2) 1 adjoint flavour RHMC
I'm seeing very low acceptance rates when running the RHMC for SU(2) with one adjoint flavour when compared to what I believe are exactly the same run parameters for HiRep. With HiRep this is around 90%, while with the same parameters in Grid it is very close to 0%.
What I've checked:
- Same fermion and gauge action (Wilson in both cases)
- Same value for beta and bare fermion mass
- Same lattice volume
- The integrator in both cases is
MinimumNorm2
(calledO2MN_multistep
in HiRep) - MD trajectory length is the same
- Same number of MD time steps
- Two levels of integration, with 1 multiplier at the top (fermionic) level and 5 multiplier at the bottom (gauge) layer
- Same numerical precision (double precision throughout)
- Even-odd preconditioning is used in both cases
- Boundary conditions are the same
- Initial conditions are the same (hot start)
- We previously found that using 2/4 as the exponent gives better performance than 1/2; I have tried adjusting
Grid/qcd/action/pseudofermion/OneFlavourEvenOddRational.h
to this effect but no obvious change in acceptance (or time to trajectory) - I've also tried adjusting the parameters to the rational approximation to increase the order and precision, but this doesn't noticeably affect the acceptance, and does make the update take longer.
- The code appears to be simulating the correct theory, as a scan of the phase diagram on a 4^4 lattice very closely reproduces the plot of arXiv:1412.5994
I've tested CPU and GPU builds (with and without MPI for the latter) and see the same issue in both.
Increasing the number of MD steps per trajectory increases the acceptance, but makes each trajectory take correspondingly longer.
Does anyone have any idea what might be going on, and how I could fix it, please?
If it's useful, I've attached an example grid.configure.summary, the program I'm running, an example submit script to see the parameters being used, and the equivalent input file used with HiRep.
Many thanks in advance for any advice.
Two more things that I have checked:
- @LupoA pointed out that there is a normalisation factor of
HMC_MOMENTUM_DENOMINATOR
that is by default set to 2, while in HiRep this factor is not included. Setting this to 1 (via removing the#define CPS_MD_TIME
inGrid/qcd/action/gauge/GaugeImplTypes.h
andGrid/qcd/action/scalar/ScalarImpl.h
) does not remove the discrepancy. - HiRep multiplies the step size by
beta / NG
, which as far as I can see isn't done in Grid. Removing this factor further exacerbates the discrepancy.
Some more testing shows that with a thermalised configuration (the same one for both codes), and controlling for all the factors above, the acceptances match much more closely between HiRep and Grid. Additionally, setting --Thermalizations
to a number larger than zero (I've been using 20) will immediately overcome this initial barrier and allow the acceptance to stabilise at the same parameters as work for HiRep (which does not do this thermalisation step, as far as I am aware). This raises three possibilities that I can see:
- HiRep does some thermalisation that I'm not aware of (although I have searched and haven't found evidence of this)
- Grid's integrator behaves differently on very far-from-equilibrium configurations
- Grid initialises a hot start differently from HiRep