mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Adding Appending Functionality to Writers

Open yuxuanzhuang opened this issue 3 years ago • 1 comments

Current Writer API

# option 1
AtomGroup.write(filename,..., **kwargs)

# option 2
with mda.Writer(filename, n_atoms) as w:
    for ts in Universe.trajectory:
        w.write(AtomGroup)

New API (append=False in default)

# option 1
AtomGroup.write(filename,..., append=False, **kwargs)

# option 2
with mda.Writer(filename, n_atoms, append=False) as w:
    for ts in Universe.trajectory:
        w.write(AtomGroup)
# adding new Base class for single frame writer
# and making sure all writers do not overwrite `write` function
class WriterBase(WriterBase):
    def __init__(self, filename, n_atoms, append=False):
        self.filename = filename
        self.n_atoms = n_atoms
        if append:
            self.check_appendibility()
        self.append = append
        self._n_frames_written = 0

    def write(self, obj, **kwargs):
        self._n_frames_written += 1
        return self._write_next_frame(obj, **kwargs)

class SingleFrameWriterBase(WriterBase):
    def __init__(self, filename, n_atoms, append=False):
        self.filename = filename
        self.n_atoms = n_atoms
        if append:
            raise ValueError("Single frame writers do not support appending")
        self.append = append
        self._n_frames_written = 0
    
    def write(self, obj, **kwargs):
        if self._n_frames_written >= 1:
            raise ValueError("Single frame writers can only write one frame")

        self._n_frames_written += 1
        return self._write_next_frame(obj, **kwargs)

I also find the current API quite messy with mixed uses of obj and ag---let alone ts (https://github.com/MDAnalysis/mdanalysis/issues/2757). I will just make sure all writers do not overwrite the write function for now...

Note

  • This issue also addresses a potential bug that it doesn't raise any warning/error when one tries to write multiple frames to an (implicit) single-frame writer:
with mda.Writer("test.gro", u.atoms.n_atoms) as W:
    for ts in u.trajectory:
        W.write(u.atoms)
  • This issue only concerns writing trajectories with constant n_atoms. (Related: https://github.com/MDAnalysis/mdanalysis/issues/3836)
  • (minor) add supporting information to https://userguide.mdanalysis.org/stable/formats/index.html
  • In terms of things to check before appending, For now (in my mind), the writer should check n_atoms matches. I will probably learn my lessons along the way :)

Use Cases

  • Perform some post-processing (e.g. transformations to fix PBC issues) to raw simulations and save them as new trajectory files. When the raw simulations update, with appending, not all frames need to be processed over and over again. This could be a potential selling point because I think gmx trjconv doesn't support appending :>)
  • Do frame extraction from multiple processes. (this might need a proper file lock)
  • pending...

Quoting @richardjgowers

The way forward to this is to flag every reader as non-append friendly, so they raise errors if mode==‘a’, then slowly unflag them as tests/functionality is rolled out in future PRs.

yuxuanzhuang avatar Oct 13 '22 13:10 yuxuanzhuang

Just bumping this. This would be incredibly useful to have!

In particular, our use case is that we use writers in jobs that can be preempted, and not having an append mode makes it quite difficult to resume a job. The only current workarounds I see at the moment are

  1. Rewrite the entire file up to that point, which can be expensive.
  2. Create a new file with a -N suffix, which splits the trajectory over multiple files.

andrrizzi avatar Jul 18 '25 11:07 andrrizzi