mdanalysis Remove frames from MemoryReader

In AlignTraj we now allow to change frames in place with the MemoryReader. This has an interesting side effect when the start, stop or step arguments are used to only align some frames. The resulting is that only some frames in the MemoryReader are changed while other still contain the original data. This can lead to a unwelcome surprise when I later iterate over the whole MemoryReader.

I think therefore we should allow to remove frames from a MemoryReader. I would say we add two methods drop and keep.

drop(start=None,stop=None,step=None, frames=None): This method will remove all frames that are specified either by a slice or a array_like of frames indices. I wouldn't call it pop because I don't see why the removed frames should be returned (and pop in the standard library only works on a single element)

keep(start=None,stop=None,step=None, frames=None: Keep only specified frames. This is basically drop(not frames) but python doesn't allow me to write this in syntax. I also like this because it makes it easier to drop unchanged frames.

res = AlignTraj(u, ref, in_memory=True, start=10, stop=100, step=10)
u.trajectory.keep(start=10, stop=100, step=10)

Does this make sense to other? Is there something missing in the proposal?

BTW: I would say in this release we still call the MemoryReader experimental and say that it's API might change in future releases to ensure that it is use and that all of MDAnalysis functions to work seamlessly with it. We are already 95% there but I keep coming across some edge cases fairly regular still.

Nov 28 '16 14:11 kain88-de

Once you have keep()/drop() then it would also make sense to add an additional kwarg to AlignTraj to make keep the default behavior with in_memory.

Nov 28 '16 23:11 orbeckst

I'd like to work on this.

Feb 05 '17 05:02 shobhitagarwal1612

@shobhitagarwal1612 That is nice. But I would like to wait with this. I'm not yet sure how to best implement this with the least amount of surprises by a user. Just cutting away frames from under them isn't nice. It would be good if @wouterboomsma or @mtiberti could add their thoughts on the proposal since they have the most experience with the MemoryReader

Feb 05 '17 11:02 kain88-de

I hadn't thought of this. It is indeed an interesting corner case. Two thoughts:

the choice of keep as the method name is not entirely intuitive to me - at least, it doesn't immediately make be think that I'm throwing something else away. Perhaps select would be more accurate?
Since MemoryReader is basically a wrapper around a numpy array - are we sure that it is necessary to extend the API? I mean, as an alternative, one could just call the MemoryReader constructor with a sliced array, right? And standard numpy slicing and np.delete would provide the functionality you are after, as far as I can see. Of course, it would make it slightly easier for the user if it was directly available in the API - but a slight concern is that there are a million things one can do with a numpy array - and exposing them all explicitly in MemoryReader would probably be overkill. If this is a corner case primarily associated with the AlignTraj functionality, one could even consider including it as an option there - instead of as a method on the MemoryReader itself.

These are just some thoughts from the top of my head. I certainly don't fundamentally disagree with your suggestions, so if they make it easier to user the MemoryReader, I'm all for it.

Feb 05 '17 11:02 wouterboomsma

Well this affects possible all code/analysis that change the coordinates of the MemoryReader. At the top of my head only AlignTraj should do that. I'm not sure if some other analysis would do this as well.

Feb 05 '17 11:02 kain88-de

Yes, I agree. If it is common to do in-place operations on subsets of the frames then it certainly makes sense to add this.

Adding this prioritises the frame dimension over the atom and coordinate dimensions (in the sense that we don't add slicing functionality to the API for those dimensions) - which is fine - I guess we have moved in that direction anyway with the default choice of 'fac'. But I was just wondering - but do we have the same problem if we select only CA-atoms in AlignTraj? I mean, is it possible to align only based on CA-atoms, and would it in this case only affect the positions of those atoms?

I'm fine with what you propose - but perhaps we should make it explicit that we are dropping and keeping in the frame dimension - drop_frames and keep_frames (or select_frames)?

Feb 05 '17 13:02 wouterboomsma

I recently came across this open issue and do agree it would be useful to drop some frames. I commonly need to analyze just a specific behavior on a subset of frames using a third-party software and I don't know of a package with such a built-in feature.

Currently, I am looping over each frame and comparing a property with a given criteria and choosing which frames I want to write. The code below assumes there is a "criteria_list" with the frame ids of the frames we want to keep.

import MDAnalysis as mda

u = mda.Universe(psf, dcd)
sel = u.select_atoms("all")

# comparing each frame to decide whether to write it or not
with mda.Writer('pseudotraj.dcd', sel.n_atoms) as w:
        for ts in u.trajectory:
                frame = int(ts.frame)
                if frame in criteria_list:
                        w.write(sel)

It works as intended, but I still believe it would be useful to have the feature discussed in this topic.

Aug 22 '22 01:08 mdpoleto

mdanalysis mdanalysis copied to clipboard

Remove frames from MemoryReader

mdanalysis
mdanalysis copied to clipboard