mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Mismatch in groupselections for RMSD between target and reference

Open lilyminium opened this issue 5 years ago • 8 comments
trafficstars

Expected behavior

If I pass an atomgroup to RMSD, groupselections are taken only from that AtomGroup

Actual behavior

This does not happen for one of the target or reference for groupselections, resulting in an error when the RMSD is calculated due to mismatching atoms.

Code to reproduce the behavior

Error when selections are not subsets of atomgroups

In [1]: import MDAnalysis as mda
   ...: from MDAnalysis.tests.datafiles import PSF, DCD, CRD
   ...: from MDAnalysis.analysis import rms

In [2]: u = mda.Universe(PSF, DCD)

In [3]: ca = u.select_atoms('name CA')

In [4]: CORE = 'backbone and (resid 1-29 or resid 60-121 or resid 160-214)'


In [5]: R = rms.RMSD(ca, ca, select=CORE,
   ...:              groupselections=[CORE],
   ...:              ref_frame=0)
   ...: R.run()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-76040dc15767> in <module>
      2              groupselections=[CORE],
      3              ref_frame=0)
----> 4 R.run()

~/anaconda3/envs/mdanalysis/lib/python3.7/site-packages/MDAnalysis/analysis/base.py in run(self, start, stop, step, verbose)
    195             self.times[i] = ts.time
    196             # logger.info("--> Doing frame {} of {}".format(i+1, self.n_frames))
--> 197             self._single_frame()
    198         logger.info("Finishing up")
    199         self._conclude()

~/anaconda3/envs/mdanalysis/lib/python3.7/site-packages/MDAnalysis/analysis/rms.py in _single_frame(self)
    663                     refpos, atoms['mobile'].positions,
    664                     weights=self.weights_groupselections[igroup-3],
--> 665                     center=False, superposition=False)
    666         else:
    667             # only calculate RMSD by setting the Rmatrix to None (no need

~/anaconda3/envs/mdanalysis/lib/python3.7/site-packages/MDAnalysis/analysis/rms.py in rmsd(a, b, weights, center, superposition)
    231     N = b.shape[0]
    232     if a.shape != b.shape:
--> 233         raise ValueError('a and b must have same shape')
    234
    235     # superposition only works if structures are centered

ValueError: a and b must have same shape

Working when selections are entirely within the atomgroups

In [7]: bb = u.select_atoms('backbone')

In [8]: R = rms.RMSD(bb, bb, select=CORE,
   ...:              groupselections=[CORE],
   ...:              ref_frame=0)
   ...: R.run()
Out[8]: <MDAnalysis.analysis.rms.RMSD at 0x1183f5f90>

Current version of MDAnalysis

  • Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)") 1.0.0
  • Which version of Python (python -V)? 3.7.3
  • Which operating system?

lilyminium avatar Jun 25 '20 23:06 lilyminium

Ok, groupselections are selected from target Universe here:

https://github.com/MDAnalysis/mdanalysis/blob/31929685389638232b8f22a2ee9cc63bd659f8d5/package/MDAnalysis/analysis/rms.py#L517-L522

But the reference groupselection is selected from the given reference AtomGroup here:

https://github.com/MDAnalysis/mdanalysis/blob/31929685389638232b8f22a2ee9cc63bd659f8d5/package/MDAnalysis/analysis/rms.py#L592-L596

It seems to me that the below check should be moved to after the reference groupselection is found, and also the original target atoms should not be selected from universes.

https://github.com/MDAnalysis/mdanalysis/blob/31929685389638232b8f22a2ee9cc63bd659f8d5/package/MDAnalysis/analysis/rms.py#L524-L533

lilyminium avatar Jun 26 '20 05:06 lilyminium

Hi I am new to open source Can you help me how to get started ?

RAJULDEV avatar Nov 09 '22 16:11 RAJULDEV

Hello @RAJULDEV , welcome to the MDAnalysis community. Please see https://userguide.mdanalysis.org/stable/contributing.html for how to contribute to MDAnalysis.

We recommend you work through our introductory tutorial https://userguide.mdanalysis.org/stable/examples/quickstart.html to get a feel for MDAnalysis if you haven't used it before. This issue, for example, makes a lot more sense when you understand what an RMSD calculation is and why the selections in question are important.

We have an active mailing list and discord server where you can start discussions, see https://www.mdanalysis.org/#participating for how to participate.

orbeckst avatar Nov 14 '22 18:11 orbeckst

Hi @lilyminium , Thanks for this issue.

I think the error in this case is simply because we are trying to select something that does not exist, i.e. backbone atoms (N,C or CA) from a universe that has only CA atoms. To see, if this is indeed the case, I changed the CORE selection to 'name CA' from 'backbone', and the code works perfectly now.

System details:

  • Which version of MDAnalysis I am using? 2.7.0 Which version of Python (python -V)? Python 3.10.8 Which operating system? Ubuntu 20.0.4.5 LTS on Windows Subsytem by Linux, on Jupyter Notebook ver 7.0.6

code to reproduce solution

CORE= 'name CA and (resid 1-29 or resid 60-121 or resid 160-214)' R = rms.RMSD(ca, ca, select=CORE, groupselections=[CORE], ref_frame=0) R.run() ##output <MDAnalysis.analysis.rms.RMSD at 0x7f41b1bd5690> `CORE = 'backbone and (resid 1-29 or resid 60-121 or resid 160-214)'

Code snippet from Jupyter Notebook

Okay, so : by using the selection on CA atoms which are technically a subset of backbone atoms, the selection is failing

If I redefined CORE with only CA atoms:

In[15]:

CORE = 'name CA and (resid 1-29 or resid 60-121 or resid 160-214)'

In[16]:

R = rms.RMSD(ca, ca, select=CORE, groupselections=[CORE], ref_frame=0)`

image

SampurnaM avatar Feb 01 '24 17:02 SampurnaM

I hadn't looked at this issue in forever... The docs for groupselection say explicitly that these selections are for the whole system:

A list of selections as described for select, with the difference that these selections are always applied to the full universes, i.e., atomgroup.universe.select_atoms(sel1) and reference.universe.select_atoms(sel2). Each selection describes additional RMSDs to be computed after the structures have been superimposed according to select. No additional fitting is performed.The output contains one additional column for each selection.

The intended behavior was "Do a superposition on whatever is good substructure and then do the RMSD calculation on whatever else you're interested in." We use this a lot to, for instance, superimpose on a stable membrane domain and then calculate the RMSD of a moving domain. In this case the two groups have zero intersection.

If I understood the Expected Behavior

If I pass an atomgroup to RMSD, groupselections are taken only from that AtomGroup

correctly then you would effectively suggest to change the code to atomgroup.select_atoms(sel1) and reference.select_atoms(sel2) or am I misunderstanding, @lilyminium ?

If my interpretation is correct then I disagree with the Expected behavior. However, even if we keep the old behavior, we shouldn't be throwing a cryptic error.

orbeckst avatar Feb 01 '24 18:02 orbeckst

Thanks for the clarification! I found this issue from the GSoC starter tag, and wanted to solve this as my first issue. Of course, my solution is just for this specific use case and not for the main code base. Happy to have a re-think on this!

(I am working up the courage for a formal introduction of myself in Discussions)

SampurnaM avatar Feb 01 '24 18:02 SampurnaM

@SampurnaM welcome to MDAnalysis :-)

orbeckst avatar Feb 01 '24 19:02 orbeckst

Can I proceed with this issue for my GSoc Starter PR please? If it is already solved, I will look at something else :)

SampurnaM avatar Mar 21 '24 18:03 SampurnaM