colvars Support dynamic communication of atom buffers

This PR revives an old branch containing work began a few years ago but not finished since. At the time, NAMD GPU-resident and the Colvars-GROMACS interface were both under active development. As of today, neither interface supports the features from this branch (they currently work on the NAMD 2.x CPU code path).

The changes in this PR support dynamically requesting and unrequesting atomic coordinates from the simulation engine. This allows skipping communication associated with atoms not being used at a given step, because of multiple time-stepping (MTS) or pairlists/neighbor lists.

This is implemented by reference counting atoms in the proxy, as well as setting frequency parameters:

For volmap variables defined in an internal frame, a keyword atomListFrequency is added to define when the full computation is done.
For coordination number or MTS, the existing keywords pairListFrequency and timeStepFactor are used.
The above keyboards are combined into a single proxy-side parameter that determines the frequency at which all atoms are re-requested again; for NAMD/GlobalMaster, atoms are requested at the step immediately prior and disabled right after.

Below is the peak performance for ApoA1 with a restraint on the coordination number between one side chain (4 atoms) and the ~21,000 water oxygens of the system and a pairlist frequency of 10000 steps. This benchmark was taken on old hardware (InfiniBand cluster with Xeon E5-2650, 32 nodes), where a multinode job was giving comparable performance to a modern GPU.

Condition	(ns/day)
unbiased	84.0
coordNum	9.1
coordNum+pairList (master)	12.5
coordNum+pairList (this branch)	79.6

Opening as draft because the implementation appears to be out of date with respect to recent changes in NAMD (CI tests are currently hanging).

Apr 15 '25 21:04 giacomofiorin

@giacomofiorin I tried to rebase the code against the latest master branch. The test 000_rmsd-mts_harmonic-fixed is still expected to fail.

Apr 22 '25 15:04 HanatoK

@giacomofiorin I tried to rebase the code against the latest master branch. The test 000_rmsd-mts_harmonic-fixed is still expected to fail.

If I recall correctly it still failed originally with an older NAMD version. I'm considering using a shorter timestep to make it more robust

Apr 30 '25 22:04 giacomofiorin

I ran the test on my laptop with AMD Ryzen 5800H and NVIDIA RTX 3060 using the GPU-resident mode, and the speed is not that slow.

Condition	(ns/day)
Unbiased	42.43
coordNum (GlobalMaster)	32.0685
coordNum (CudaGM+SOA)	40.19

I am curious why it was 9.1 ns/day in your test. My Colvars test file is:

indexFile index.ndx

colvar {
  name cv
  coordNum {
    group1 {
      atomNumbers {5 17 31 55}
    }
    group2 {
      indexGroup waters
    }
    tolerance 0.001
    pairListFrequency 10
    cutoff 10.0
  }
}

harmonic {
  colvars cv
  centers 475.0
  forceConstant 0.001
}

where waters is an atom selection that has 21,458 atoms.

May 13 '25 20:05 HanatoK

@HanatoK That's good to see! (Especially the CudaGM result)

The hardware config was completely different, so I don't know for sure why the big difference. Some factors are probably your CPU (newer and higher clock) and the fact that in GPU-resident mode the CPU is mostly dedicated to GlobalMaster+Colvars.

IMO, adopting CudaGM as the default interface for GPU-resident NAMD is of higher priority than finishing this PR. I was mostly concerned with mitigating the slowdown in message-passing communication, but it's not a big deal if we end up not supporting it when it's not needed.

May 13 '25 22:05 giacomofiorin