mdanalysis
mdanalysis copied to clipboard
confusing error message from DSSP for too short peptide stretches
Expected behavior
I recently noticed that DSSP analysis has been added to MDAnalysis, but have noticed that despite the documentation stating that one can provide the atoms input as either a Universe of an AtomGroup, providing an AtomGroup is not currently working.
Actual behavior
To ensure that there was not an unexpected issue from my own systems, I took one of the example scripts from the DSSP documentation and attempted to use an AtomGroup in it. The initial output makes it clear that I have created a valid AtomGroup that contains atoms:
<class 'MDAnalysis.core.groups.AtomGroup'>
<AtomGroup [<Atom 1: N of type opls_287 of resname MET, resid 1 and segid seg_0_AKeco>, <Atom 2: H1 of type opls_290 of resname MET, resid 1 and segid seg_0_AKeco>, <Atom 3: H2 of type opls_290 of resname MET, resid 1 and segid seg_0_AKeco>, ..., <Atom 41: HH22 of type opls_301 of resname ARG, resid 2 and segid seg_0_AKeco>, <Atom 42: C of type opls_235 of resname ARG, resid 2 and segid seg_0_AKeco>, <Atom 43: O of type opls_236 of resname ARG, resid 2 and segid seg_0_AKeco>]>
However, providing this in place of the full universe creates a ValueError when assigning coords (see the stack trace I've put in a collapsible block at the bottom of this post to avoid this being too long).
Code to reproduce the behavior
import MDAnalysis as mda
from MDAnalysis.analysis.dssp import DSSP, translate
from MDAnalysisTests.datafiles import TPR, XTC
u = mda.Universe(TPR, XTC)
threshold = 0.8
res1 = u.select_atoms("resid 1-2")
print(type(res1))
print(res1.atoms)
long_run = DSSP(res1).run()
persistent_residues = translate(
long_run
.results
.dssp_ndarray
.mean(axis=0) > threshold
)
print(''.join(persistent_residues)[:20])
....
Current version of MDAnalysis
- Which version are you using? (run
python -c "import MDAnalysis as mda; print(mda.__version__)") MDAnalysis 2.9.0 - Which version of Python (
python -V)? Python 3.10.17 - Which operating system? Linux, on a LinuxMint distro
Stack Trace
ValueError Traceback (most recent call last)
Cell In[14], line 10
8 print(type(res1))
9 print(res1.atoms)
---> 10 long_run = DSSP(res1).run()
11 persistent_residues = translate(
12 long_run
13 .results
14 .dssp_ndarray
15 .mean(axis=0) > threshold
16 )
17 print(''.join(persistent_residues)[:20])
File [~/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/base.py:878](http://localhost:8888/home/michael/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/base.py#line=877), in AnalysisBase.run(self, start, stop, step, frames, verbose, n_workers, n_parts, backend, unsupported_backend, progressbar_kwargs)
871 computation_groups = self._setup_computation_groups(
872 start=start, stop=stop, step=step, frames=frames, n_parts=n_parts
873 )
875 # get all results from workers in other processes.
876 # we need `AnalysisBase` classes
877 # since they hold `frames`, `times` and `results` attributes
--> 878 remote_objects: list["AnalysisBase"] = executor.apply(
879 worker_func, computation_groups
880 )
881 self.frames = np.hstack([obj.frames for obj in remote_objects])
882 self.times = np.hstack([obj.times for obj in remote_objects])
File [~/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/backends.py:208](http://localhost:8888/home/michael/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/backends.py#line=207), in BackendSerial.apply(self, func, computations)
192 def apply(self, func: Callable, computations: list) -> list:
193 """
194 Serially applies `func` to each task object in ``computations``.
195
(...)
206 list of results of the function
207 """
--> 208 return [func(task) for task in computations]
File [~/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/backends.py:208](http://localhost:8888/home/michael/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/backends.py#line=207), in <listcomp>(.0)
192 def apply(self, func: Callable, computations: list) -> list:
193 """
194 Serially applies `func` to each task object in ``computations``.
195
(...)
206 list of results of the function
207 """
--> 208 return [func(task) for task in computations]
File [~/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/base.py:543](http://localhost:8888/home/michael/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/base.py#line=542), in AnalysisBase._compute(self, indexed_frames, verbose, progressbar_kwargs)
541 self.frames[idx] = ts.frame
542 self.times[idx] = ts.time
--> 543 self._single_frame()
544 logger.info("Finishing up")
545 return self
File [~/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/dssp/dssp.py:394](http://localhost:8888/home/michael/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/dssp/dssp.py#line=393), in DSSP._single_frame(self)
392 def _single_frame(self):
393 coords = self._get_coords()
--> 394 dssp = assign(coords)
395 self.results.dssp_ndarray.append(dssp)
File [~/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/dssp/pydssp_numpy.py:242](http://localhost:8888/home/michael/mambaforge/envs/jupyter/lib/python3.10/site-packages/MDAnalysis/analysis/dssp/pydssp_numpy.py#line=241), in assign(coord)
240 # helix4 first, as alpha helix
241 helix4 = h4 + np.roll(h4, 1, 0) + np.roll(h4, 2, 0) + np.roll(h4, 3, 0)
--> 242 h3 = h3 * ~np.roll(helix4, -1, 0) * ~helix4 # helix4 is higher prioritized
243 h5 = h5 * ~np.roll(helix4, -1, 0) * ~helix4 # helix4 is higher prioritized
244 helix3 = h3 + np.roll(h3, 1, 0) + np.roll(h3, 2, 0)
ValueError: operands could not be broadcast together with shapes (4,) (5,)
Thanks for reporting this!
I believe the error occurs because the selected residue segment is too short to compute secondary structure. For example, if you use res1 = u.select_atoms("resid 1-6"), the error does not occur.
With that said, I agree the current error message is confusing. It would be better to raise a clearer exception earlier when fewer than 6 residues are provided as input.
Feel free to chime in and take a look when you get a chance! @marinegor
Ah right, that is a surprisingly obvious solution in hindsight! For some reason I was expecting that it would perform analysis of the selection incorporating the surrounding peptide residues that were not in the selection and only return data for the selected residues. Only including residues from the selection makes sense. A clearer exception sounds like a good resolution to this issue!
I changed the title to make clear that we just need better error handling in DSSP.