A more efficient way of reading MD trajectory
In the workflow, we do not need to read every frame of trajectory, but only what we want. So, we should firstly make the following dict to map the frame to the trajectory:
frames_dict = {
Trajectory0: [23, 56, 78],
Trajectory1: [22],
...
}
Then, reading each trajectory:
for traj, f_idx in frames_dict.items():
traj.read(f_idx)
For a LAMMPS trajectory or other raw text files, the read should be
def read(self, f_idx: list[int]):
with open(self.fname) as f:
for ii, lines in enumerate(itertools.zip_longest(*[f] * self.nlines)):
if ii not in f_idx:
continue
self.process_block(lines)
where nlines is the number of lines in each block, which should be determined in the very beginning. Usually, every frame has the same number of lines.
process_block method should convert a LAMMPS frame to dpdata.
This way of loading trajectory should replace the implementation in https://github.com/deepmodeling/dpgen2/blob/2011090d12ba26a1eb3849634883f9ae0b62cc9d/dpgen2/exploration/selector/conf_selector_frame.py#L132-L138
The problem is how. Shall we provide this highly efficient way of reading frames from trajectories to dpdata or directly implement it to dpgen2?
We should add to dpdata, so others packages which use dpdata will also be benefited.
Enhance this method:
https://github.com/deepmodeling/dpdata/blob/e8a2fbeaa60bae0e780d55f0fc8feae63a9ed863/dpdata/lammps/dump.py#L139-L158