ProDy icon indicating copy to clipboard operation
ProDy copied to clipboard

bad change to mmCIF chain vs segment?

Open jamesmkrieger opened this issue 1 year ago • 0 comments

mmCIF files often split different kinds of entities into chains and segments and have more divisions in the hierarchical view than PDB files

For example, with 4ake, we get 4 instead of 2. We have the option unite_chains, which restores this back and gives a similar behaviour to ChimeraX, but the issue is what happens when we don't use this option to have a more similar behaviour to PyMOL.

In the released version tag v2.4.1, we get something similar to PyMOL:

In [21]: ag = prody.parseMMCIF('4ake')

In [22]: list(ag.getHierView())
Out[22]: 
[<Chain: A from Segment A from 4ake (214 residues, 1656 atoms)>,
 <Chain: B from Segment B from 4ake (214 residues, 1656 atoms)>,
 <Chain: A from Segment C from 4ake (72 residues, 72 atoms)>,
 <Chain: B from Segment D from 4ake (75 residues, 75 atoms)>]

In our current ProDy master, we get them switched and that's probably an issue:

In [2]: ag = prody.parseMMCIF('4ake')

In [3]: list(ag.getHierView())
Out[3]: 
[<Chain: A from Segment A from 4ake (214 residues, 1656 atoms)>,
 <Chain: B from Segment B from 4ake (214 residues, 1656 atoms)>,
 <Chain: C from Segment A from 4ake (72 residues, 72 atoms)>,
 <Chain: D from Segment B from 4ake (75 residues, 75 atoms)>]

There seems to be a difference related to biomol assemblies with v2.4.1 giving an error and master not giving one, but not necessarily giving the right result although I think it does. Here is the example for 1ake:

v2.4.1

In [28]: ag = prody.parseMMCIF('1ake', biomol=True)

In [29]: ag
Out[29]: 
[<AtomGroup: 1ake biomolecule 1 (1954 atoms)>,
 <AtomGroup: 1ake biomolecule 2 (1850 atoms)>]

In [30]: [bm.numChains() for bm in ag]
Out[30]: [1, 1]

In [31]: [list(bm.getHierView()) for bm in ag]
Out[31]: 
[[<Chain: A from Segment 1 from 1ake biomolecule 1 (456 residues, 1954 atoms)>],
 [<Chain: B from Segment 1 from 1ake biomolecule 2 (352 residues, 1850 atoms)>]]

In [32]: ag = prody.parseMMCIF('1ake', biomol=True, unite_chains=True)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[32], line 1
----> 1 ag = prody.parseMMCIF('1ake', biomol=True, unite_chains=True)

File ~/software/scipion3/software/em/prody-2.4.1/ProDy/prody/proteins/ciffile.py:125, in parseMMCIF(pdb, **kwargs)
    123 cif.close()
    124 if unite_chains:
--> 125     result.setSegnames(result.getChids())
    126 return result

AttributeError: 'list' object has no attribute 'setSegnames'

In [33]: ag = prody.parseMMCIF('1ake', biomol=True)

In [34]: [list(bm.protein.getHierView()) for bm in ag]
Out[34]: 
[[<Chain: A from Segment 1 from 1ake biomolecule 1 (214 residues, 1656 atoms)>],
 [<Chain: B from Segment 1 from 1ake biomolecule 2 (214 residues, 1656 atoms)>]]

master

In [10]: ag = prody.parseMMCIF('1ake', biomol=True, unite_chains=True)

In [11]: ag
Out[11]: 
[<AtomGroup: 1ake biomolecule 1 (1954 atoms)>,
 <AtomGroup: 1ake biomolecule 2 (1850 atoms)>]

In [12]: [list(bm.getHierView()) for bm in ag]
Out[12]: 
[[<Chain: A1 from Segment A1 from 1ake biomolecule 1 (456 residues, 1954 atoms)>],
 [<Chain: B1 from Segment B1 from 1ake biomolecule 2 (352 residues, 1850 atoms)>]]

In [13]: ag = prody.parseMMCIF('1ake', biomol=True)

In [14]: ag
Out[14]: 
[<AtomGroup: 1ake biomolecule 1 (1954 atoms)>,
 <AtomGroup: 1ake biomolecule 2 (1850 atoms)>]

In [17]: [list(bm.getHierView()) for bm in ag]
Out[17]: 
[[<Chain: A from Segment A1 from 1ake biomolecule 1 (214 residues, 1656 atoms)>,
  <Chain: C from Segment A1 from 1ake biomolecule 1 (1 residues, 57 atoms)>,
  <Chain: E from Segment A1 from 1ake biomolecule 1 (241 residues, 241 atoms)>],
 [<Chain: B from Segment B1 from 1ake biomolecule 2 (214 residues, 1656 atoms)>,
  <Chain: D from Segment B1 from 1ake biomolecule 2 (1 residues, 57 atoms)>,
  <Chain: F from Segment B1 from 1ake biomolecule 2 (137 residues, 137 atoms)>]]

In [18]: [list(bm.protein.getHierView()) for bm in ag]
Out[18]: 
[[<Chain: A from Segment A1 from 1ake biomolecule 1 (214 residues, 1656 atoms)>],
 [<Chain: B from Segment B1 from 1ake biomolecule 2 (214 residues, 1656 atoms)>]]

jamesmkrieger avatar Jun 04 '24 19:06 jamesmkrieger