mdanalysis Incorrect data points occur when using lineardensity to calculate the linear density of a LAMMPS system.

Expected behavior

I've been employing the lineardensity module from MDAnalysis to determine the linear density of my system, which consists of a box of water molecules simulated with LAMMPS. The dimensions of the box are 20Å*20Å*20Å, and the water molecules within are uniformly distributed. Thus, the linear density across all three axes should be uniform.

Actual behavior

the linear density calculated by lineardensity includes some data points that exceed normal ranges, as illustrated.

These incorrect data points are related to the bin size. The figure above shows the results with a bin size set to 0.1Å. If I adjust the bin size to 0.08Å, the calculated data appear as shown in the figure.

This error occurs when calculating the linear density along the x, y, or z axes.

The link contains the files that triggered this bug. https://drive.google.com/drive/folders/1ZMFmBWchReYCCrAM3sIjMFVwOIUSNRVl?usp=drive_link

A possible reason: Although the box of the system ranges from 0 to 20Å, some atoms in the LAMMPS XTC file have out-of-bounds coordinates (<0 or > 20Å), with about 1% of the values exceeding the bounds.

Code to reproduce the behavior

import matplotlib.pyplot as plt
from MDAnalysis.analysis import lineardensity as lin

import MDAnalysis as mda

u = mda.Universe(
    rf"water2.data", 
    rf"s_w_na_out.xtc",
    atom_style='id mol type charge x y z')

ow = u.select_atoms("all")
density = lin.LinearDensity(ow, grouping='atoms', binsize=0.1)
density.run(start=0, stop=100000, step=1)

plt.plot(density.results['z']['mass_density'])
plt.show()

Current version of MDAnalysis

I'm using version 2.7.0 of MDAnalysis, and to my knowledge, this bug has been present since at least version 2.3. My Python version is 3.11.5, and the operating system is Windows 11.

Mar 01 '24 02:03 Tennisatw

Although the box of the system ranges from 0 to 20Å, some atoms in the LAMMPS XTC file have out-of-bounds coordinates (<0 or > 20Å), with about 1% of the values exceeding the bounds.

Have you tried first rewrapping all your data into the primary box?

Is this a simulation with constant volume (NVT, ie bins are guaranteed to be the same) or is it constant pressure (with variable z dimensions)?

Mar 25 '24 18:03 orbeckst

@PicoCentauri do you have any insights here?

Mar 25 '24 18:03 orbeckst

Have you tried first rewrapping all your data into the primary box?

No, actually I don't know how to do that.

Is this a simulation with constant volume (NVT, ie bins are guaranteed to be the same) or is it constant pressure (with variable z dimensions)?

It's a constant volume simulation. It's NVT.

Mar 26 '24 19:03 Tennisatw

For "unwrapping" see https://userguide.mdanalysis.org/stable/examples/transformations/center_protein_in_box.html — essentially making sure that molecules are whole. This is still a bit slow in MDA so if you can do it faster with LAMMPS then definitely do.

NVT should be fine.

Mar 29 '24 19:03 orbeckst

@PicoCentauri I am tentatively assigning this issue to you as the original author of the code. Could you please keep an eye on it and respond as necessary?

If you don't have the bandwidth to do this at the moment please say something and un-assign yourself. Thanks!

Mar 29 '24 22:03 orbeckst

Primarily we need to know if this is an actual bug. If it becomes clear that this is the case, add the defect label and un-assign yourself to indicate that anyone may work on this issue.

Mar 29 '24 22:03 orbeckst

Hi everyone,

I rewrapped my data, but the "peaks" still exist. Here is my code.

import matplotlib.pyplot as plt
from MDAnalysis.analysis import lineardensity as lin
from MDAnalysis.transformations import wrap
import MDAnalysis as mda

u = mda.Universe(
    rf"water2.data", 
    rf"s_w_na_out.xtc",
    atom_style='id mol type charge x y z',
    in_memory=True)

box = [20, 20, 20, 90, 90, 90]

u.dimensions = box
ow = u.select_atoms("all")

# pos = [ts[0][0] for ts in u.trajectory]
# print(min(pos), max(pos))
# before wrapping: -0.34000003 20.16

for ts in u.trajectory:
    ow.wrap(compound='atoms', box=box)
    
# pos = [ts[0][0] for ts in u.trajectory]
# print(min(pos), max(pos))
# after wrapping: 0.0 19.990002
    
density = lin.LinearDensity(ow, grouping='atoms', binsize=0.1, box=box)
density.run(start=0, stop=90000, step=1)

plt.plot(density.results['z']['mass_density'])
plt.show()

Figure_2

Btw, In density.run, I changed the stop value from 100,000 to 90,000 because setting it to 100,000 results in an error:

Traceback (most recent call last):
  File "d:\work\Program\MDanalysis\test.py", line 29, in <module>       
    density.run(start=0, stop=100000, step=1)
  File "C:\ProgramData\anaconda3\Lib\site-packages\MDAnalysis\analysis\base.py", line 448, in run
    self._single_frame()
  File "C:\ProgramData\anaconda3\Lib\site-packages\MDAnalysis\analysis\lineardensity.py", line 257, in _single_frame
    self._ags[0].wrap(compound=self.grouping)
  File "C:\ProgramData\anaconda3\Lib\site-packages\MDAnalysis\core\groups.py", line 1676, in wrap
    raise ValueError("No dimensions information in Universe. "
ValueError: No dimensions information in Universe.  Either use the 'box' argument or set the '.dimensions' attribute

I suspect this is another bug, but I can ignore it for now.

Mar 31 '24 05:03 Tennisatw

Hi everyone,

I've found the location of the bug. It's in lineardensity.py, within the __init__ function of the LinearDensity class (which inherits from AnalysisBase). Specifically, it's at line 218:

bins = (self.dimensions // self.binsize).astype(int)

When self.dimensions is numpy([20., 20., 20.]) and self.binsize is 0.1, due to floating-point precision errors, the bins are not 200, 200, 200, but 199, 199, 199. This causes errors in subsequent np.histogram function calculations.

I think we can replace it with something like:

bins = np.round(self.dimensions / self.binsize).astype(int)

I tried it, and it makes the linear density distribution correct at bin=0.1Å, as shown below.

Figure_1

As mentioned above, when the bin size is 0.08 Å, the generated linear density distribution is still incorrect. I believe this error originates from the LAMMPS software rather than MDAnalysis, as I couldn't reproduce the same error when analyzing trajectories generated with GROMACS.

Jun 10 '24 19:06 Tennisatw

@PicoCentauri can you have a look at the proposed fix? Is there something going wrong with bin assignment? If yes, we need to label this issue as a bug.

Jun 11 '24 02:06 orbeckst

mdanalysis mdanalysis copied to clipboard

Incorrect data points occur when using lineardensity to calculate the linear density of a LAMMPS system.

Expected behavior

Actual behavior

Code to reproduce the behavior

Current version of MDAnalysis

mdanalysis
mdanalysis copied to clipboard