dpdata A more efficient way of reading MD trajectory

In the workflow, we do not need to read every frame of trajectory, but only what we want. So, we should firstly make the following dict to map the frame to the trajectory:

frames_dict = {
  Trajectory0: [23, 56, 78],
  Trajectory1: [22],
  ...
}

Then, reading each trajectory:

for traj, f_idx in frames_dict.items():
    traj.read(f_idx)

For a LAMMPS trajectory or other raw text files, the read should be

def read(self, f_idx: list[int]):
    with open(self.fname) as f:
        for ii, lines in enumerate(itertools.zip_longest(*[f] * self.nlines)):
            if ii not in f_idx:
                continue
            self.process_block(lines)

where nlines is the number of lines in each block, which should be determined in the very beginning. Usually, every frame has the same number of lines.

process_block method should convert a LAMMPS frame to dpdata.

Mar 07 '22 03:03 njzjz

This way of loading trajectory should replace the implementation in https://github.com/deepmodeling/dpgen2/blob/2011090d12ba26a1eb3849634883f9ae0b62cc9d/dpgen2/exploration/selector/conf_selector_frame.py#L132-L138

The problem is how. Shall we provide this highly efficient way of reading frames from trajectories to dpdata or directly implement it to dpgen2?

Mar 07 '22 03:03 amcadmus

We should add to dpdata, so others packages which use dpdata will also be benefited.

Oct 20 '22 22:10 njzjz

Enhance this method:

https://github.com/deepmodeling/dpdata/blob/e8a2fbeaa60bae0e780d55f0fc8feae63a9ed863/dpdata/lammps/dump.py#L139-L158

Oct 20 '22 22:10 njzjz