mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

`center` attribute does not take cell dimensions into account

Open PicoCentauri opened this issue 2 years ago • 14 comments

Expected behavior

Calculating center positions like the center of mass in a periodic cell should take the periodicity into account. An edge case to test are two particles at the boundaries of the cell.

 +-----------+
 |           |
 x 1       2 |
 |           |
 +-----------+

where x here marks the center of mass of the two particles. The equations for the correct computation are known [1][2] and a reference implementation is is given below. However, I am not sure if this works also for triclinic boxes.

Actual behavior

The cell is not taken into accounted in the center method

https://github.com/MDAnalysis/mdanalysis/blob/c01d82bb86b480c8b2ae87f36497e7fc0a726c84/package/MDAnalysis/core/groups.py#L1097-L1099

since the non periodic equations are applied. This leads to the following for the above example

 +-----------+
 |           |
 | 1   x   2 |
 |           |
 +-----------+

The 'wrong' center of mass effects downstream methods like transformations.center_in_box(). The center of mass works as expected if particles are connected by bonds and unwrap=True is applied. However, bonds between all atoms in an AtomGroup should not be mandatory to calculate a center.

Code to reproduce the behavior

import MDAnalysis as mda
import numpy as np
from MDAnalysis.core._get_readers import get_reader_for

u = mda.Universe.empty(n_atoms=2,
                       n_residues=2,
                       n_segments=2,
                       atom_resindex=[0, 1],
                       residue_segindex=[0, 1])

u.add_TopologyAttr("masses", [1, 1])
u.add_TopologyAttr("bonds", ((0, 1),))
positions = np.array([[.1, 0, 0], [.9, 0, 0]])

u.trajectory = get_reader_for(positions)(positions, order='fac', n_atoms=2)

for ts in u.trajectory:
    ts.dimensions = np.array([1, 1, 1, 90, 90, 90])

# Wrong center of mass
print(u.atoms.center_of_mass(unwrap=False))

# Correct one (ONLY works if bonds are present)
print(u.atoms.center_of_mass(unwrap=True))
[0.49999999 0.         0.        ]
[-2.23517418e-08  0.00000000e+00  0.00000000e+00]

The correct center of mass is calculated by applying.

def com_pbc(ag):
    """According to Wikipedia

    https://en.wikipedia.org/wiki/Center_of_mass#Systems_with_periodic_boundary_conditions
    """
    L = ag.universe.dimensions[:3]
    theta = (ag.positions / L) * 2 * np.pi
    xi = ((np.cos(theta) * ag.masses[:, np.newaxis]).sum(axis=0)
          / ag.masses.sum())
    zeta = ((np.sin(theta) * ag.masses[:, np.newaxis]).sum(axis=0)
            / ag.masses.sum())
    theta_com = np.arctan2(-zeta, -xi) + np.pi
    com = theta_com / (2 * np.pi) * L
    return com

print(com_cluster(u.atoms))
[0.99999999 0.         0.        ]

Current version of MDAnalysis

  • Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)") 2.2.0
  • Which version of Python (python -V)? 3.1
  • Which operating system? MacOS

PicoCentauri avatar Jul 01 '22 11:07 PicoCentauri

Do you think that your proposed solution should be the default behavior (ie, would this be a bug fix) or should it be triggered with a kwarg?

orbeckst avatar Jul 05 '22 01:07 orbeckst

Furthermore, making it work with triclinic boxes is important. From the Wikipedia page alone it looked to me that the transformation to the circle relies on an orthorhombic coordinate system.

orbeckst avatar Jul 05 '22 01:07 orbeckst

Do you think that your proposed solution should be the default behavior (ie, would this be a bug fix) or should it be triggered with a kwarg?

I like the way in MDAnalysis.lib.distances. When you provide a box PBC is taken into account. However, center is an attribute and there is no need to provide a box. The Implicit/bugfix way would treat PBC's if a box exists and use the current code if no box is present. This option is always the "correct" one. But we need clear docs to avoid confusion when which code is used.

Furthermore, making it work with triclinic boxes is important. From the Wikipedia page alone it looked to me that the transformation to the circle relies on an orthorhombic coordinate system.

I also think so. I will see if I find a treatment for triclinic boxes.

PicoCentauri avatar Jul 05 '22 07:07 PicoCentauri

Perhaps the standard trick for triclinic boxes works: transform with the inverse of the box matrix to the coordinates in the triclinic basis vectors, apply the transformation to a circle, then back transform.

orbeckst avatar Jul 05 '22 08:07 orbeckst

The question of "bug" vs "change in behavior" is important because if it's a bug then we can just fix it for 2.3.0 (or 2.2.1). If it's a change in behavior then the actual change from the current default behavior can only happen in 3.0.

orbeckst avatar Jul 05 '22 08:07 orbeckst

Perhaps the standard trick for triclinic boxes works: transform with the inverse of the box matrix to the coordinates in the triclinic basis vectors, apply the transformation to a circle, then back transform.

yes lets do it this way for now.

PicoCentauri avatar Jul 05 '22 09:07 PicoCentauri

The question of "bug" vs "change in behavior" is important because if it's a bug then we can just fix it for 2.3.0 (or 2.2.1). If it's a change in behavior then the actual change from the current default behavior can only happen in 3.0.

True. When I found this out it felt like a bug. The position of the center of mass is not where it supposed to be in a periodic system. A similar thing would be when you calculate a distance between two particles and you get distances larger than half of the box length.

PicoCentauri avatar Jul 05 '22 09:07 PicoCentauri

I've not actually seen these trig tricks before, but it might work in fractional space, we've got R_to_S and S_to_R already to check this. I think it is an API break, so would be a slow change. It's also worth remembering that periodic aware (especially triclinic) are much slower, so it's worth keeping the current "opt-in" pattern

richardjgowers avatar Jul 05 '22 09:07 richardjgowers

Okay we can do it with a keyword.

PicoCentauri avatar Jul 05 '22 09:07 PicoCentauri

I'm not sure if this requires a different keyword, or if this is a better way of implementing the unwrap=True, compound='group' combination.

richardjgowers avatar Jul 05 '22 10:07 richardjgowers

You don't have to unwrap your system for the periodic center calculations. In fact, the algorithm will wrap the system by projecting it on an angle. If I introduce a new option pbc deciding to treat center calculations with or without pbc, I only see two reasonable combinations with the other parameters:

  1. pbc=False, wrap=False: Treat the system as unperiodic
  2. pbc=False, wrap=True: Wrap coordinates but calculate distances without periodicity. Might be faster (I can test it) for many atoms and a groups that doesn't surpass the cell. However, then wrapping is also not needed...
  3. pbc=True, unwrap=False: Treat the system as periodic
  4. pbc=True, unwrap=True: Same result as 3 but more operations involved.

Maybe I miss something here but everything boils down to either set pbc=True or pbc=False.

PicoCentauri avatar Jul 05 '22 11:07 PicoCentauri

The pbc keyword is deprecated and will be removed in 3.0.

I think @richardjgowers is right in that we are talking about the question how compound = group should behave.

orbeckst avatar Jul 05 '22 16:07 orbeckst

You don't have to unwrap your system for the periodic center calculations.

Yes, that's a good thing! The meaning of unwrap=True compound="group" is, however, what you described as your expectation: that the center of mass of the whole group of atoms is calculated as if they were not broken across PB. Therefore, it looks as if your suggested code is potentially a performance improvement over the current code path that actually runs unwrapping.

It also means that we have a way to check correctness of the new approach... assuming that I understood everything correctly.

orbeckst avatar Jul 05 '22 19:07 orbeckst

Not exactly. We get the same behavior for unwrap=True if and only if all atoms in the group are connected via bonds. Applying unwrap() translates atoms of each molecule so that bonds do not split over images. Especially for compound=group this is not always given. If we consider an AtomGroup made of two dimers with different bond lengths located close to the cell boundary

 +-----------+
 |-A     A---|
 |           |
 |-B       B-|
 +-----------+

Applying unwrap=True and compound='molecules' we get two centers x

   +-----------+
   |       A--x|-A
   |           |
 B-x-B         |
   +-----------+

This behavior will not change with the current implementation and the proposed algorithm because bonds are present. However, for compound='group' the current one will result in a center x at

   +-----------+
   |       A---|-A
   |     x     |
 B-|-B         |
   +-----------+

in the middle of the box because the dimers will be translated to opposing sides of the cell. While the proposed algorithm will put the center of mass at the boundary.

 +-----------+
 |-A     A---|
 x           |
 |-B       B-|
 +-----------+

Note that for two dimers of the same bond length at the same horizontal position the current implementation (unwrap=True, compound='group') and the proposed algorithm will again give the same because dimers are translated to the same side of the cell.

PicoCentauri avatar Jul 06 '22 08:07 PicoCentauri