Implement efficient trajectory frame reading for LAMMPS dump files
This PR implements efficient trajectory frame reading for LAMMPS dump files, allowing users to read only specific frames instead of loading entire trajectories. This addresses the performance issue where workflows only need a subset of frames but must load and process complete trajectories.
Key Features
1. Selective Frame Reading
Added f_idx parameter to dpdata.System() for loading only specified frames:
# Load only frames 23, 56, and 78 from a trajectory
system = dpdata.System(
'trajectory.dump',
fmt='lammps/dump',
type_map=['O', 'H'],
f_idx=[23, 56, 78]
)
2. Multi-Trajectory Pattern
Implemented the exact frames_dict pattern requested in the issue:
import dpdata.lammps.dump as dump
frames_dict = {
'trajectory1.dump': [23, 56, 78],
'trajectory2.dump': [22],
}
# Efficiently load specified frames from multiple trajectories
data = dump.load_frames_from_trajectories(frames_dict, type_map=['O', 'H'])
3. Efficient Block-Based Reading
The implementation uses itertools.zip_longest(*[f] * nlines) to read frames in blocks and skip unwanted frames, as suggested in the issue. This provides significant performance improvements for large trajectories when only a few frames are needed.
Technical Implementation
-
Frame Structure Detection:
get_frame_nlines()automatically determines the number of lines per frame -
Efficient Reading:
read_frames()uses block-based reading to skip unwanted frames entirely -
Enhanced API: Extended
load_file()to support both traditionalbegin/stepand newf_idxparameters -
Seamless Integration: Works with existing
system_data()pipeline and dpdata workflow
Performance Benefits
- Memory Efficiency: Only loads requested frames into memory
- I/O Efficiency: Skips unwanted frames during file reading without processing
- Processing Efficiency: No need to load everything then filter
Backward Compatibility
The implementation maintains complete backward compatibility:
- Existing code using
beginandstepparameters continues to work unchanged - All existing tests pass without modification
- The
f_idxparameter is optional and defaults toNone
Testing
Added comprehensive test suite with 22 test cases covering:
- Basic functionality and API usage
- Edge cases (empty lists, out-of-range indices, duplicates)
- Performance validation and correctness verification
- Backward compatibility with existing workflows
Fixes #367.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.