graphein
graphein copied to clipboard
Support for mmCIF Files
Is your feature request related to a problem? Please describe. Currently, we only support PDB files as inputs for protein structure graphs. Large complexes are now unavailable as PDBs.:
Several type of PDB entries are not offered in the legacy PDB format anymore: Entries containing multiple character chain ids Entries containing > 62 chains Entries containing > 99999 ATOM coordinates Entries that have complex beta sheet topology, see more details
Describe the solution you'd like Once BioPandas has support for parsing mmCIF files (rasbt/biopandas#94) , we can parse the DFs into a format consistent with PDB files. This is the simplest route forward. However, mmCIF files are 'better' (esp wrt how they handle insertions / altlocs) as well as author inconsistencies in the contents. Longer term we may consider refactoring to treat mmCIF as the first class citizen input file format.
The MMCIF -> PDB conversion should be available quite soon in Biopandas, waiting to be merged: https://github.com/rasbt/biopandas/pull/107
After this is done, using MMCIF's should take couple lines work :)