mdanalysis
mdanalysis copied to clipboard
Incorrect data points occur when using lineardensity to calculate the linear density of a LAMMPS system.
Expected behavior
I've been employing the lineardensity module from MDAnalysis to determine the linear density of my system, which consists of a box of water molecules simulated with LAMMPS. The dimensions of the box are 20Å*20Å*20Å
, and the water molecules within are uniformly distributed. Thus, the linear density across all three axes should be uniform.
Actual behavior
the linear density calculated by lineardensity includes some data points that exceed normal ranges, as illustrated.
These incorrect data points are related to the bin size. The figure above shows the results with a bin size set to 0.1Å. If I adjust the bin size to 0.08Å, the calculated data appear as shown in the figure.
This error occurs when calculating the linear density along the x, y, or z axes.
The link contains the files that triggered this bug. https://drive.google.com/drive/folders/1ZMFmBWchReYCCrAM3sIjMFVwOIUSNRVl?usp=drive_link
A possible reason: Although the box of the system ranges from 0 to 20Å, some atoms in the LAMMPS XTC file have out-of-bounds coordinates (<0 or > 20Å), with about 1% of the values exceeding the bounds.
Code to reproduce the behavior
import matplotlib.pyplot as plt
from MDAnalysis.analysis import lineardensity as lin
import MDAnalysis as mda
u = mda.Universe(
rf"water2.data",
rf"s_w_na_out.xtc",
atom_style='id mol type charge x y z')
ow = u.select_atoms("all")
density = lin.LinearDensity(ow, grouping='atoms', binsize=0.1)
density.run(start=0, stop=100000, step=1)
plt.plot(density.results['z']['mass_density'])
plt.show()
Current version of MDAnalysis
I'm using version 2.7.0 of MDAnalysis, and to my knowledge, this bug has been present since at least version 2.3. My Python version is 3.11.5, and the operating system is Windows 11.
Although the box of the system ranges from 0 to 20Å, some atoms in the LAMMPS XTC file have out-of-bounds coordinates (<0 or > 20Å), with about 1% of the values exceeding the bounds.
Have you tried first rewrapping all your data into the primary box?
Is this a simulation with constant volume (NVT, ie bins are guaranteed to be the same) or is it constant pressure (with variable z dimensions)?
@PicoCentauri do you have any insights here?
Have you tried first rewrapping all your data into the primary box?
No, actually I don't know how to do that.
Is this a simulation with constant volume (NVT, ie bins are guaranteed to be the same) or is it constant pressure (with variable z dimensions)?
It's a constant volume simulation. It's NVT.
For "unwrapping" see https://userguide.mdanalysis.org/stable/examples/transformations/center_protein_in_box.html — essentially making sure that molecules are whole. This is still a bit slow in MDA so if you can do it faster with LAMMPS then definitely do.
NVT should be fine.
@PicoCentauri I am tentatively assigning this issue to you as the original author of the code. Could you please keep an eye on it and respond as necessary?
If you don't have the bandwidth to do this at the moment please say something and un-assign yourself. Thanks!
Primarily we need to know if this is an actual bug. If it becomes clear that this is the case, add the defect label and un-assign yourself to indicate that anyone may work on this issue.
Hi everyone,
I rewrapped my data, but the "peaks" still exist. Here is my code.
import matplotlib.pyplot as plt
from MDAnalysis.analysis import lineardensity as lin
from MDAnalysis.transformations import wrap
import MDAnalysis as mda
u = mda.Universe(
rf"water2.data",
rf"s_w_na_out.xtc",
atom_style='id mol type charge x y z',
in_memory=True)
box = [20, 20, 20, 90, 90, 90]
u.dimensions = box
ow = u.select_atoms("all")
# pos = [ts[0][0] for ts in u.trajectory]
# print(min(pos), max(pos))
# before wrapping: -0.34000003 20.16
for ts in u.trajectory:
ow.wrap(compound='atoms', box=box)
# pos = [ts[0][0] for ts in u.trajectory]
# print(min(pos), max(pos))
# after wrapping: 0.0 19.990002
density = lin.LinearDensity(ow, grouping='atoms', binsize=0.1, box=box)
density.run(start=0, stop=90000, step=1)
plt.plot(density.results['z']['mass_density'])
plt.show()
Btw, In density.run, I changed the stop value from 100,000 to 90,000 because setting it to 100,000 results in an error:
Traceback (most recent call last):
File "d:\work\Program\MDanalysis\test.py", line 29, in <module>
density.run(start=0, stop=100000, step=1)
File "C:\ProgramData\anaconda3\Lib\site-packages\MDAnalysis\analysis\base.py", line 448, in run
self._single_frame()
File "C:\ProgramData\anaconda3\Lib\site-packages\MDAnalysis\analysis\lineardensity.py", line 257, in _single_frame
self._ags[0].wrap(compound=self.grouping)
File "C:\ProgramData\anaconda3\Lib\site-packages\MDAnalysis\core\groups.py", line 1676, in wrap
raise ValueError("No dimensions information in Universe. "
ValueError: No dimensions information in Universe. Either use the 'box' argument or set the '.dimensions' attribute
I suspect this is another bug, but I can ignore it for now.
Hi everyone,
I've found the location of the bug. It's in lineardensity.py, within the __init__
function of the LinearDensity
class (which inherits from AnalysisBase
). Specifically, it's at line 218:
bins = (self.dimensions // self.binsize).astype(int)
When self.dimensions
is numpy([20., 20., 20.])
and self.binsize
is 0.1, due to floating-point precision errors, the bins are not 200, 200, 200
, but 199, 199, 199
. This causes errors in subsequent np.histogram
function calculations.
I think we can replace it with something like:
bins = np.round(self.dimensions / self.binsize).astype(int)
I tried it, and it makes the linear density distribution correct at bin=0.1Å, as shown below.
As mentioned above, when the bin size is 0.08 Å, the generated linear density distribution is still incorrect. I believe this error originates from the LAMMPS software rather than MDAnalysis, as I couldn't reproduce the same error when analyzing trajectories generated with GROMACS.
@PicoCentauri can you have a look at the proposed fix? Is there something going wrong with bin assignment? If yes, we need to label this issue as a bug.