graphein icon indicating copy to clipboard operation
graphein copied to clipboard

Support for mmCIF Files

Open a-r-j opened this issue 3 years ago • 1 comments
trafficstars

Is your feature request related to a problem? Please describe. Currently, we only support PDB files as inputs for protein structure graphs. Large complexes are now unavailable as PDBs.:

Several type of PDB entries are not offered in the legacy PDB format anymore: Entries containing multiple character chain ids Entries containing > 62 chains Entries containing > 99999 ATOM coordinates Entries that have complex beta sheet topology, see more details

Describe the solution you'd like Once BioPandas has support for parsing mmCIF files (rasbt/biopandas#94) , we can parse the DFs into a format consistent with PDB files. This is the simplest route forward. However, mmCIF files are 'better' (esp wrt how they handle insertions / altlocs) as well as author inconsistencies in the contents. Longer term we may consider refactoring to treat mmCIF as the first class citizen input file format.

a-r-j avatar Mar 11 '22 13:03 a-r-j

The MMCIF -> PDB conversion should be available quite soon in Biopandas, waiting to be merged: https://github.com/rasbt/biopandas/pull/107

After this is done, using MMCIF's should take couple lines work :)

mrauha avatar Aug 29 '22 08:08 mrauha