mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

confusing error message from DSSP for too short peptide stretches

Open mrobinson-hub opened this issue 6 months ago • 3 comments

Expected behavior

I recently noticed that DSSP analysis has been added to MDAnalysis, but have noticed that despite the documentation stating that one can provide the atoms input as either a Universe of an AtomGroup, providing an AtomGroup is not currently working.

Actual behavior

To ensure that there was not an unexpected issue from my own systems, I took one of the example scripts from the DSSP documentation and attempted to use an AtomGroup in it. The initial output makes it clear that I have created a valid AtomGroup that contains atoms:

<class 'MDAnalysis.core.groups.AtomGroup'>
<AtomGroup [<Atom 1: N of type opls_287 of resname MET, resid 1 and segid seg_0_AKeco>, <Atom 2: H1 of type opls_290 of resname MET, resid 1 and segid seg_0_AKeco>, <Atom 3: H2 of type opls_290 of resname MET, resid 1 and segid seg_0_AKeco>, ..., <Atom 41: HH22 of type opls_301 of resname ARG, resid 2 and segid seg_0_AKeco>, <Atom 42: C of type opls_235 of resname ARG, resid 2 and segid seg_0_AKeco>, <Atom 43: O of type opls_236 of resname ARG, resid 2 and segid seg_0_AKeco>]>

However, providing this in place of the full universe creates a ValueError when assigning coords (see the stack trace I've put in a collapsible block at the bottom of this post to avoid this being too long).

Code to reproduce the behavior

import MDAnalysis as mda
from MDAnalysis.analysis.dssp import DSSP, translate
from MDAnalysisTests.datafiles import TPR, XTC
u = mda.Universe(TPR, XTC)
threshold = 0.8

res1 = u.select_atoms("resid 1-2")
print(type(res1))
print(res1.atoms)
long_run = DSSP(res1).run()
persistent_residues = translate(
    long_run
    .results
    .dssp_ndarray
    .mean(axis=0) > threshold
)
print(''.join(persistent_residues)[:20])

....

Current version of MDAnalysis

  • Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)") MDAnalysis 2.9.0
  • Which version of Python (python -V)? Python 3.10.17
  • Which operating system? Linux, on a LinuxMint distro

Stack Trace

ValueError                                Traceback (most recent call last)
Cell In[14], line 10
      8 print(type(res1))
      9 print(res1.atoms)
---> 10 long_run = DSSP(res1).run()
     11 persistent_residues = translate(
     12     long_run
     13     .results
     14     .dssp_ndarray
     15     .mean(axis=0) > threshold
     16 )
     17 print(''.join(persistent_residues)[:20])

File [~/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/base.py:878](http://localhost:8888/home/michael/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/base.py#line=877), in AnalysisBase.run(self, start, stop, step, frames, verbose, n_workers, n_parts, backend, unsupported_backend, progressbar_kwargs)
    871 computation_groups = self._setup_computation_groups(
    872     start=start, stop=stop, step=step, frames=frames, n_parts=n_parts
    873 )
    875 # get all results from workers in other processes.
    876 # we need `AnalysisBase` classes
    877 # since they hold `frames`, `times` and `results` attributes
--> 878 remote_objects: list["AnalysisBase"] = executor.apply(
    879     worker_func, computation_groups
    880 )
    881 self.frames = np.hstack([obj.frames for obj in remote_objects])
    882 self.times = np.hstack([obj.times for obj in remote_objects])

File [~/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/backends.py:208](http://localhost:8888/home/michael/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/backends.py#line=207), in BackendSerial.apply(self, func, computations)
    192 def apply(self, func: Callable, computations: list) -> list:
    193     """
    194     Serially applies `func` to each task object in ``computations``.
    195 
   (...)
    206         list of results of the function
    207     """
--> 208     return [func(task) for task in computations]

File [~/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/backends.py:208](http://localhost:8888/home/michael/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/backends.py#line=207), in <listcomp>(.0)
    192 def apply(self, func: Callable, computations: list) -> list:
    193     """
    194     Serially applies `func` to each task object in ``computations``.
    195 
   (...)
    206         list of results of the function
    207     """
--> 208     return [func(task) for task in computations]

File [~/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/base.py:543](http://localhost:8888/home/michael/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/base.py#line=542), in AnalysisBase._compute(self, indexed_frames, verbose, progressbar_kwargs)
    541     self.frames[idx] = ts.frame
    542     self.times[idx] = ts.time
--> 543     self._single_frame()
    544 logger.info("Finishing up")
    545 return self

File [~/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/dssp/dssp.py:394](http://localhost:8888/home/michael/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/dssp/dssp.py#line=393), in DSSP._single_frame(self)
    392 def _single_frame(self):
    393     coords = self._get_coords()
--> 394     dssp = assign(coords)
    395     self.results.dssp_ndarray.append(dssp)

File [~/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/dssp/pydssp_numpy.py:242](http://localhost:8888/home/michael/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/dssp/pydssp_numpy.py#line=241), in assign(coord)
    240 # helix4 first, as alpha helix
    241 helix4 = h4 + np.roll(h4, 1, 0) + np.roll(h4, 2, 0) + np.roll(h4, 3, 0)
--> 242 h3 = h3 * ~np.roll(helix4, -1, 0) * ~helix4  # helix4 is higher prioritized
    243 h5 = h5 * ~np.roll(helix4, -1, 0) * ~helix4  # helix4 is higher prioritized
    244 helix3 = h3 + np.roll(h3, 1, 0) + np.roll(h3, 2, 0)

ValueError: operands could not be broadcast together with shapes (4,) (5,)

mrobinson-hub avatar May 14 '25 10:05 mrobinson-hub

Thanks for reporting this!

I believe the error occurs because the selected residue segment is too short to compute secondary structure. For example, if you use res1 = u.select_atoms("resid 1-6"), the error does not occur.

With that said, I agree the current error message is confusing. It would be better to raise a clearer exception earlier when fewer than 6 residues are provided as input.

Feel free to chime in and take a look when you get a chance! @marinegor

yuxuanzhuang avatar May 19 '25 00:05 yuxuanzhuang

Ah right, that is a surprisingly obvious solution in hindsight! For some reason I was expecting that it would perform analysis of the selection incorporating the surrounding peptide residues that were not in the selection and only return data for the selected residues. Only including residues from the selection makes sense. A clearer exception sounds like a good resolution to this issue!

mrobinson-hub avatar May 19 '25 14:05 mrobinson-hub

I changed the title to make clear that we just need better error handling in DSSP.

orbeckst avatar Jun 11 '25 19:06 orbeckst