Cannot access frames or slice trajectories for large or combined dcds
Expected behavior
-
Slice trajectories as described in the userguide. Eventual goal is to concatenate / combine a bunch of .dcd trajectories (load all at once) and do analysis on every 10th frame, so basically exactly #3075
-
Use existing workaround to slice individual trajectories into new dcds and then concatenate those.
Actual behavior
- When iterating over a dcd (by any more than 1 frame at a time) or when attempting to access frames 939+ (have tried many dcds):
OSError Traceback (most recent call last)
Cell In[3], line 1
----> 1 for ts in u.trajectory[::10]:
2 print(ts.frame)
File [~\AppData\Local\anaconda3\envs\mdanalysis\Lib\site-packages\MDAnalysis\coordinates\base.py:211](http://localhost:8888/lab/workspaces/auto-W/tree/Desktop/~/AppData/Local/anaconda3/envs/mdanalysis/Lib/site-packages/MDAnalysis/coordinates/base.py#line=210), in FrameIteratorSliced.__iter__(self)
209 def __iter__(self):
210 for i in range(self.start, self.stop, self.step):
--> 211 yield self.trajectory[i]
212 self.trajectory.rewind()
File [~\AppData\Local\anaconda3\envs\mdanalysis\Lib\site-packages\MDAnalysis\coordinates\base.py:833](http://localhost:8888/lab/workspaces/auto-W/tree/Desktop/~/AppData/Local/anaconda3/envs/mdanalysis/Lib/site-packages/MDAnalysis/coordinates/base.py#line=832), in ProtoReader.__getitem__(self, frame)
831 if isinstance(frame, numbers.Integral):
832 frame = self._apply_limits(frame)
--> 833 return self._read_frame_with_aux(frame)
834 elif isinstance(frame, (list, np.ndarray)):
835 if len(frame) != 0 and isinstance(frame[0], (bool, np.bool_)):
836 # Avoid having list of bools
File [~\AppData\Local\anaconda3\envs\mdanalysis\Lib\site-packages\MDAnalysis\coordinates\base.py:865](http://localhost:8888/lab/workspaces/auto-W/tree/Desktop/~/AppData/Local/anaconda3/envs/mdanalysis/Lib/site-packages/MDAnalysis/coordinates/base.py#line=864), in ProtoReader._read_frame_with_aux(self, frame)
863 def _read_frame_with_aux(self, frame):
864 """Move to *frame*, updating ts with trajectory, transformations and auxiliary data."""
--> 865 ts = self._read_frame(frame) # pylint: disable=assignment-from-no-return
866 for aux in self.aux_list:
867 ts = self._auxs[aux].update_ts(ts)
File [~\AppData\Local\anaconda3\envs\mdanalysis\Lib\site-packages\MDAnalysis\coordinates\DCD.py:198](http://localhost:8888/lab/workspaces/auto-W/tree/Desktop/~/AppData/Local/anaconda3/envs/mdanalysis/Lib/site-packages/MDAnalysis/coordinates/DCD.py#line=197), in DCDReader._read_frame(self, i)
196 """read frame i"""
197 self._frame = i - 1
--> 198 self._file.seek(i)
199 return self._read_next_timestep()
File [~\AppData\Local\anaconda3\envs\mdanalysis\Lib\site-packages\MDAnalysis\lib\formats\libdcd.pyx:400](http://localhost:8888/lab/workspaces/auto-W/tree/Desktop/~/AppData/Local/anaconda3/envs/mdanalysis/Lib/site-packages/MDAnalysis/lib/formats/libdcd.pyx#line=399), in MDAnalysis.lib.formats.libdcd.DCDFile.seek()
OSError: DCD seek failed with DCD error=Normal EOF
- When using the workaround and only writing every 10th frame manually with i % 10 == 0 (no slice), I can iterate to the end of the dcd with no issue and write every 10th frame. However, I can only do this if 1 dcd is loaded. If I try to load 2 dcds, then I get the same error. So I can do this on all individual dcds in my overall simulation. Then I can load all of the shortened trajectories at once and avoid the above problem by simply using every frame (the stride/slice has already been applied now). However, the issue is that the ts.time values of the new trajectory are not correct and the original ts.time information seems to have been lost. The frames of the new trajectory are all separated by ~1 picosecond. The time also resets to 0 at the frame where two shortened dcds got concatenated. (e.g. times = 1,2,...,999,1000,0,1,2,...) So I cannot graph my metrics against the original time scale as I intended. A workaround would be to use my step size and frame save rate, but I am wondering if this behavior is expected. This time discrepancy in written dcds is also present on Linux.
In short, the second problem is that when writing a dcd from a loaded dcd, the time of each frame is incorrect/lost.
Code to reproduce the behavior
- The test files are not long enough for this issue. My error begins consistently at frame 939. When I put 10x of the test DCD to get 940+ frames, I don't get any of these errors. But in that case I am using the test PSF because the test PDB gives an error with the test DCD. When I run this with my exact same files on Linux, I also don't get this first problem.
Iterating over every frame with no skips produces no errors.
for ts in u.trajectory:
print(ts.frame)
Directly from the userguide:
u = mda.Universe('test.pdb', 'test.dcd')
fiter = u.trajectory[10::10]
frames = [ts.frame for ts in fiter]
print(frames, u.trajectory.frame)
Out: ... OSError: DCD seek failed with DCD error=Normal EOF
And similarly, this also fails if the 10 is replaced with anything other than 1:
u = mda.Universe('test.pdb', 'test.dcd')
for ts in u.trajectory[::10]:
print(ts.frame)
Out: ... OSError: DCD seek failed with DCD error=Normal EOF
Accessing any frame 938 or before is fine.
In: u.trajectory[938]
Out: < Timestep 938 with unit cell dimensions [132.12038 120.366974 119.66685 90. 90. 90. ] >
But for all frames 939+:
In: u.trajectory[939]
Out: ... OSError: DCD seek failed with DCD error=Normal EOF
- Workaround with 1 dcd loaded works, but the times of the frames of the written dcd do not match that of the original:
u = mda.Universe("test.pdb", "test.dcd")
ag = u.select_atoms('all')
with mda.Writer('out.dcd', ag.n_atoms) as w:
for ts in u.trajectory:
if ts.frame % 10 == 0: # Write every 10th frame
w.write(ag.atoms)
print(ts.frame)
print('Done')
With multiple dcds loaded, prints Done with no errors but the last frame recorded is 930, and frame times are still incorrect:
u = mda.Universe("test.pdb", "test.dcd", "test.dcd")
ag = u.select_atoms('all')
with mda.Writer('out.dcd', ag.n_atoms) as w:
for ts in u.trajectory:
if ts.frame % 10 == 0: # Write every 10th frame
w.write(ag.atoms)
print(ts.frame)
print('Done')
Current version of MDAnalysis
- Which version are you using? 2.7.0
- Which version of Python (
python -V)? Python 3.12.0 (using JupyterLab or Spyder) - Which operating system? Windows 11 (first issue does not occur on Linux with same pdb and dcd files, frame times still an issue)
- How did you produce your DCD files? Does your program properly record the number of frames in it?
- How big (in GiB) are your DCD files? How many atoms per frame?
- You can try the DCD from https://www.mdanalysis.org/MDAnalysisData/adk_equilibrium.html which contains >4000 frames (install MDAnalysisData with e.g.
mamba install mdanalysisdata). This works for me:
In [1]: from MDAnalysisData import datasets
In [2]: adk = datasets.fetch_adk_equilibrium()
In [3]: import MDAnalysis as mda
In [4]: u = mda.Universe(adk.topology, adk.trajectory)
~/anaconda3/envs/mda311/lib/python3.11/site-packages/MDAnalysis/coordinates/DCD.py:165: DeprecationWarning: DCDReader currently makes independent timesteps by copying self.ts while other readers update self.ts inplace. This behavior will be changed in 3.0 to be the same as other readers. Read more at https://github.com/MDAnalysis/mdanalysis/issues/3889 to learn if this change in behavior might affect you.
warnings.warn("DCDReader currently makes independent timesteps"
In [5]: u.trajectory
Out[5]: <DCDReader ~/MDAnalysis_data/adk_equilibrium/1ake_007-nowater-core-dt240ps.dcd with 4187 frames of 3341 atoms>
In [6]: frames = [ts.frame for ts in u.trajectory[10::10]]
In [7]: print(frames, u.trajectory.frame)
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000, 1010, 1020, 1030, 1040, 1050, 1060, 1070, 1080, 1090, 1100, 1110, 1120, 1130, 1140, 1150, 1160, 1170, 1180, 1190, 1200, 1210, 1220, 1230, 1240, 1250, 1260, 1270, 1280, 1290, 1300, 1310, 1320, 1330, 1340, 1350, 1360, 1370, 1380, 1390, 1400, 1410, 1420, 1430, 1440, 1450, 1460, 1470, 1480, 1490, 1500, 1510, 1520, 1530, 1540, 1550, 1560, 1570, 1580, 1590, 1600, 1610, 1620, 1630, 1640, 1650, 1660, 1670, 1680, 1690, 1700, 1710, 1720, 1730, 1740, 1750, 1760, 1770, 1780, 1790, 1800, 1810, 1820, 1830, 1840, 1850, 1860, 1870, 1880, 1890, 1900, 1910, 1920, 1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020, 2030, 2040, 2050, 2060, 2070, 2080, 2090, 2100, 2110, 2120, 2130, 2140, 2150, 2160, 2170, 2180, 2190, 2200, 2210, 2220, 2230, 2240, 2250, 2260, 2270, 2280, 2290, 2300, 2310, 2320, 2330, 2340, 2350, 2360, 2370, 2380, 2390, 2400, 2410, 2420, 2430, 2440, 2450, 2460, 2470, 2480, 2490, 2500, 2510, 2520, 2530, 2540, 2550, 2560, 2570, 2580, 2590, 2600, 2610, 2620, 2630, 2640, 2650, 2660, 2670, 2680, 2690, 2700, 2710, 2720, 2730, 2740, 2750, 2760, 2770, 2780, 2790, 2800, 2810, 2820, 2830, 2840, 2850, 2860, 2870, 2880, 2890, 2900, 2910, 2920, 2930, 2940, 2950, 2960, 2970, 2980, 2990, 3000, 3010, 3020, 3030, 3040, 3050, 3060, 3070, 3080, 3090, 3100, 3110, 3120, 3130, 3140, 3150, 3160, 3170, 3180, 3190, 3200, 3210, 3220, 3230, 3240, 3250, 3260, 3270, 3280, 3290, 3300, 3310, 3320, 3330, 3340, 3350, 3360, 3370, 3380, 3390, 3400, 3410, 3420, 3430, 3440, 3450, 3460, 3470, 3480, 3490, 3500, 3510, 3520, 3530, 3540, 3550, 3560, 3570, 3580, 3590, 3600, 3610, 3620, 3630, 3640, 3650, 3660, 3670, 3680, 3690, 3700, 3710, 3720, 3730, 3740, 3750, 3760, 3770, 3780, 3790, 3800, 3810, 3820, 3830, 3840, 3850, 3860, 3870, 3880, 3890, 3900, 3910, 3920, 3930, 3940, 3950, 3960, 3970, 3980, 3990, 4000, 4010, 4020, 4030, 4040, 4050, 4060, 4070, 4080, 4090, 4100, 4110, 4120, 4130, 4140, 4150, 4160, 4170, 4180] 0