mdtraj icon indicating copy to clipboard operation
mdtraj copied to clipboard

atom_slice: provides a warning when input indices are not sorted or not unique

Open yaoyic opened this issue 3 years ago • 3 comments

Recently I have been puzzled by the index order issue of Trajectory.atom_slice in my project, and it took me sometime to track down the cause:

I want to change the order of some atoms in a residue, such that I use the structure with another force field. Intuitively, I thought it could be done by atom_slice method, but in the end I found the coordinates have been correctly reordered, but not the topology, which ends up in the output as wrong labeling for the swapped atoms.

The problem is that the internal logic is inconsistent for trajectory and topology slicing. For the former, it views the provided atom_indices as an ordered list. https://github.com/mdtraj/mdtraj/blob/1a757274f6f9acd6c9c847c72f5857e9c1046d17/mdtraj/core/trajectory.py#L1804 But for the latter, it views atom_indices as a unordered set, https://github.com/mdtraj/mdtraj/blob/1a757274f6f9acd6c9c847c72f5857e9c1046d17/mdtraj/core/trajectory.py#L1807 https://github.com/mdtraj/mdtraj/blob/1a757274f6f9acd6c9c847c72f5857e9c1046d17/mdtraj/core/topology.py#L96 and thus the output follows the original topological order. Therefore, there will be an inconsistency between the coordinates and the topology, when the provided atom_indices are not an sorted list.

It seems to be a long-lasting issue (see e.g., #1343) and I can understand that there might not be a simple solution to handle input that is not sorted or has repeating indices. But I think it is worth noting in the docstring, giving a Warning or raising an Error in the python code when such an input is provided, rather than silently introducing this inconsistency.

Last by not least, thanks for writing and maintaining this package! I have always found it very helpful and handy.

yaoyic avatar Sep 12 '22 09:09 yaoyic

I'm neutral on a change that would fix/update this behavior. I'd certainly review a PR that takes a stab at it. It might take a bit of refactoring since the order at the new atoms (atom_indices) is determined by the atom ordering in the old topology (L90-95, the outer loops from what you posted). In general, re-ordering atoms in existing objects is playing with fire and I'm not surprised that the bookkeeping has a mismatch.

But I guess that's not what you're proposing - I'd be happy to see a PR that checks that atom_indices is sorted somewhere early in Topology._topology_from_subset, raising a warning if it is not. It should only happen once per call and be pretty quick so I can't imagine a performance hit. I'd prefer a warning raised in the code since I don't tend to read warnings in docstrings but might read a warning emitted by a script I wrote.

mattwthompson avatar Sep 12 '22 16:09 mattwthompson

I have a proposed fix according to your second paragraph above (warning raised if atom_indices are not monotonically increasing). PR on the way.

CharlieLaughton avatar Feb 21 '23 14:02 CharlieLaughton

Hey thanks for doing this, this drove me nuts initially!

gph82 avatar Feb 21 '23 14:02 gph82