VMC w/o drift is significantly slower than with drift on GPUs for spinors
I'm doing some performance checks, and noticed about ~5x slowdown if drift is off for VMC on calculations with spinors. This came up because my nexus workflow generates an optimization block and sets useDrift = no. I was doing some throughput checks with VMC with drift to set the appropriate number of walkers per rank, and noticed the discrepancy in block time between the VMC runs.
The difference in drift vs. no drift is mw_calcRatio vs. mw_calcRatioGrad, so I expect something is actually offloading in the mw_calcRatio only path.
Current workaround is to always have drift on
Did you notice if this also applied to non-spinor calculations with your setup?
I did check do one run with normal wave functions, and there was a fairly small difference between the timings, but nothing like the spinor case. Normal had drift about 6% slower than no drift, whereas SOC had nodrift about ~5x slower than drift