openmm Allow all MC barostats to scale constrainted atom groups

This pull request addresses issue #3723. In the discussion, there are two approaches: 1) scale all atoms independently and then apply constraints; 2) scale constrained atom groups directly. This pull request implements the second method because it's more efficient and it's sufficient to meet most cases I can imagine.

An argument scaleMoleculesAsRigid=true is added to all MC barostats. When set to false, the barostat scaling will be performed on the center of constrained atom groups, instead of molecules. The default behavior is the same as the current.

It is mainly useful for systems containing only a few molecules, e.g. covalent crystals, cross-linked polymers, etc...

The current MonteCarloFlexibleBarostat already has this argument. However, it doesn't respect constraints. It is also fixed in this pull request.

Oct 01 '22 07:10 z-gong

Thanks! As we discussed previously, scaling atoms independently and then applying constraints is the preferred solution. See https://github.com/openmm/openmm/issues/3723#issuecomment-1206815578 and https://github.com/openmm/openmm/issues/3723#issuecomment-1207255363 for discussion of why it's better. It also is much simpler to implement (no need for the getConstrainedGroups() function you created). Is there some reason you preferred scaling groups rigidly instead?

Oct 02 '22 21:10 peastman

I don't have a preference for either approach. I choose to scale groups rigidly because it's straightforward to implement and could be more efficient. I'll try to implement the other approach also and see if there's a noticeable difference in performance.

Oct 03 '22 03:10 z-gong

If we scaled atoms independently and apply constraints after, what should the numberOfScaledParticles equal to? It cannot equal system.getNumParticles(), because some DOFs are frozen during the barostat scaling. It cannot equal system.getNumParticles()-system.getNumConstraints() either. Otherwise, for pure water, it will be zero.

double w = finalEnergy-initialEnergy + pressure*deltaVolume - numberOfScaledParticles*kT*log(newVolume/volume);

Oct 03 '22 04:10 z-gong

Excellent question. I believe the answer is that it shouldn't change. It should still be the number of molecules.

That term comes from the dependence of the free energy on the box volume: TS = kT*log(Omega). Omega is the density of states, whose dependence on box volume for N molecules is V^N. It doesn't matter whether a molecule is rigid or has internal degrees of freedom, so long as there's an upper bound on how far apart the atoms can get from each other. In that case, increasing the size of the simulation box doesn't increase its internal entropy. The only effect is to increase the entropy of its center of mass motion.

Does that sound right?

Oct 03 '22 19:10 peastman

whose dependence on box volume for N molecules is V^N. It doesn't matter whether a molecule is rigid or has internal degrees of freedom

This implies that the molecular translational DOFs and internal DOFs are uncoupled. It is valid for small molecules but not accurate for giant molecules. Imaging a single polymer (in amorphous liquid) linked to its periodic images, The N will be zero, which leads to kT*log(Omega)=0, which is counter-intuitive because a larger volume allows for more space for internal rotational DOFs.

Oct 08 '22 01:10 z-gong

I think what you're describing is an effect of interactions between atoms within the molecule? That's accounted for differently, through the energy term. The N*log(V) term is only to reflect the fact that a larger box allows each molecule to be placed at more locations.

Oct 09 '22 17:10 peastman

I have been thinking about this. Probably it is better to scale constrained atoms as a group, rather than scale atoms individually and apply constraints after.

Start from the partition function of NPT ensemble $$\Delta(N,P,T) = \int_0^{\infty} \mathrm{d} V \mathrm{e}^{-\beta P V} \int_{D(V)} \mathrm{d} \mathbf{r}_1 \cdots \mathrm{d} \mathbf{r}_N \mathrm{e}^{-\beta U\left(\mathbf{r}\right)}$$

For the Monte Carlo volume move, we need to cancel the dependence of atom coordinates on volume, therefore the scaled coordinates $\mathbf{s} = \mathbf{r}/L$ are introduced, to make the range of the second integration irrelevant of $V$ (Tuckerman, Statistical Mechanics Theory and Molecular Simulation, 2010)

We can scale the atoms isotropically (if there is no constraint), then there are $3 N$ independent coordinate variables (x, y and z for each atom) $$\Delta(N,P,T) =\int_0^{\infty} \mathrm{d} V \mathrm{e}^{-\beta P V} \int \mathrm{d} \mathbf{s}_1 \cdots \mathrm{d} \mathbf{s}_N \mathrm{e}^{-\beta U\left(L \mathbf{s}\right)}L^{3N}$$

which gives the weight function $$\Delta W=\Delta U+P\Delta V-Nk_{B}T \text{ln}\left(\frac{V+\Delta V}{V}\right)$$

Or we can scale several atoms as a group. Each group may contain arbitrary atoms. E.g. a molecule (the current implementation in OpenMM), or atoms constrained together (as implemented in this PR). Then the number of independent coordinate variables will be $3 G$, $G$ is the number of groups, $\mathbf{o}$ is the relative coordinates between atoms and group center. $$\Delta(N,P,T) =\int_0^{\infty} \mathrm{d} V \mathrm{e}^{-\beta P V} \int \mathrm{d} \mathbf{s}_1 \cdots \mathrm{d} \mathbf{s}_G \mathrm{e}^{-\beta U\left(L \mathbf{s} + \mathbf{o} \right)}L^{3G}$$

The weight function will be $$\Delta W=\Delta U+P\Delta V-Gk_{B}T \text{ln}\left(\frac{V+\Delta V}{V}\right)$$

However, if we scale the atoms individually and apply constraints after, the atom coordinates cannot be decoupled from the volume through a choice of $\mathbf{s}$. Then an appropriate weight function cannot be derived.

Nov 12 '22 14:11 z-gong

About the issue https://github.com/openmm/openmm/issues/3723#issuecomment-1206815578 we discussed, that constrained atoms being scaled together will make the molecule unnecessarily rigid. It may degrade the performance, but it won't cause incorrect sampling. The Monte Carlo moves only sample the cell space. The coordinate space under a selected cell will be sampled by the integrator. In the worst case, if some molecules are too rigid during an anisotropic Monte Carlo scaling, the move will simply be rejected.

Nov 12 '22 14:11 z-gong

I think you're confusing yourself by making it more complicated than it is. In the N*log(V) term, N is the number of molecules. A molecule is defined by the atoms being bound to each other so they can't move independently. Whether that's done through constraints or through the potential function doesn't matter.

Think of a single molecule in the center of a large box. Its internal entropy is independent of the box size. It's determined entirely by internal interactions within the molecule. Whether those interactions are implemented as constraints or bonds doesn't matter. If you make the box bigger, it doesn't allow the molecule to take on more conformations. The only effect is that the entire molecule can be translated to more positions.

Contrast that with a set of independent atoms or clusters, ones that are not bonded to each other. In that case, increasing the box size allows each atom or cluster to independently be translated to more positions.

Nov 14 '22 19:11 peastman

I think in general the factor of N comes from the Jacobian of the coordinate transformation from Cartesian lab coordinates to dimensionless unit cell coordinates. If the centers of mass of the molecules are rescaled, then N is the number of such centers of mass. If a molecule is kept fixed and its center of mass is not rescaled, it should not contribute to this number.

Nov 14 '22 19:11 egallicc

Think of a single molecule in the center of a large box... If you make the box bigger, it doesn't allow the molecule to take on more conformations. The only effect is that the entire molecule can be translated to more positions.

Contrast that with a set of independent atoms or clusters, ones that are not bonded to each other. In that case, increasing the box size allows each atom or cluster to independently be translated to more positions.

There's no substantial difference between a molecule and a group of nonbonded atoms. They are just particles and interactions. In both cases, all atoms can go everywhere in the box. The integral range of the partition function is the same. The difference is the probability of each configuration. Atoms belonging to the same 'molecule' tend to stay close to each other.

In the N*log(V) term, N is the number of molecules.

In the MC weight function, N is not necessarily the number of molecules. As @egallicc pointed out, it comes from the coordinate transformation from Cartesian to dimensionless unit. So it depends on how the MC moves are constructed. If atoms are scaled independently, N equals to the number of atoms.

Nov 15 '22 03:11 z-gong

I added a test case as a numerical validation of the implementation. It's an ideal gas box made of 64 triple-atom molecules. The first bond is constrained. scaleMoleculesAsRigid set to true or false both give the correct volume. The variance is also the same. The difference is that volume correlates much stronger when each constrained group is scaled as a unit.

Nov 23 '22 09:11 z-gong

Sorry I haven't responded to this properly yet. I just need to set aside some time to work through the theory. I haven't forgotten about it!

Dec 08 '22 00:12 peastman

Is it worth revisiting this now that the other MonteCarloBarostat issues have been fixed?

My understanding is that scaling constrained atom groups should work in the same way as the current scaling molecules implementation. Where instead of choosing the rigidly moved groups based on what atoms are bonded together (molecules) they are chosen based on what atoms have constraints between them (constrained groups).

As to the other possible implementation of scaling atoms individually and then applying constraints I am not sure if that type of MC move obeys detailed balance.

Aug 25 '23 12:08 sef43

openmm openmm copied to clipboard

Allow all MC barostats to scale constrainted atom groups

openmm
openmm copied to clipboard