parser/mmcif failed to parse .cif template
Boltz2 currently seems unable to support .cif files generated by BioPython (b/c it does not contain entity.id section).
This can be reproduced with the following code.
First we generate a 1crn_saved.cif file using BioPython.
from Bio.PDB import PDBList, MMCIFIO, MMCIFParser
import gemmi
# Step 1: Fetch the structure for 1CRN from the PDB
pdbl = PDBList()
pdb_id = "1crn"
pdb_file = pdbl.retrieve_pdb_file(pdb_id, pdir='.', file_format='mmCif')
# Step 2: Load the structure using Biopython
parser = MMCIFParser(QUIET=True)
structure = parser.get_structure(pdb_id, pdb_file)
# Step 3: Save the structure to a new .cif file using Biopython
output_cif = "1crn_saved.cif"
io = MMCIFIO()
io.set_structure(structure)
io.save(output_cif)
We then verify 1crn_saved.cif can be read by many packages, such as gemmi. The output is
Structure name: 1crn
Number of models: 1
Chains in first model: ['A']
# Step 4: Load the saved .cif file using Gemmi
gemmi_structure = gemmi.read_structure(output_cif)
# Step 5: Print basic info
print(f"Structure name: {gemmi_structure.name}")
print(f"Number of models: {len(gemmi_structure)}")
print("Chains in first model:", [chain.name for chain in gemmi_structure[0]])
The code below shows the 1crn_saved.cif file cannot be processed by Boltz2
import sys
sys.path.insert(0, 'src/boltz/data/parse')
from mmcif import parse_mmcif
parse_mmcif("1crn_saved.cif")
The error message is: File "src/boltz/data/parse/mmcif.py", line 890, in parse_mmcif entity: gemmi.Entity = entities[subchain_id] ~~~~~~~~^^^^^^^^^^^^^ KeyError: 'A'
I think this is because Boltz2 code relies on entities, which is an empty list in this example. Could the boltz code be improved to acquire data more robustly from .cif?
Thanks!
Yeah, I believe I can fix that. I'll take a look, thanks for flagging.
Just an FYI, in my hands I observe the same behavior. A cif file from the RCSB works though.
I also had trouble with files created using BioPython or PyMol.
I found that Maxit worked in my case (I downloaded source code for maxit-v11.300)
Chiming in with my experience in case it may be useful. Using OpenBabel to convert PDBs to mmCIFs didn't work for me, but using gemmi does, as long as I make sure the SEQRES is correct or otherwise manually add it using gemmi.
I tried to generate .cif file from a .pdb using gemmi, pymol and chimeraX to no avail.
I tried to generate .cif file from a .pdb using gemmi, pymol and chimeraX to no avail.
Does your PDB have a SEQRES in it?
@seankhl does your converted CIF have _entity_poly? I am using gemmi=0.6.5 and converting from a cropped PDB (after manually adding back the SEQRES bc PyMol deletes this information when I crop). If I input this as a template, however, I get "ValueError: No chains parsed!". I was able to resolve by looking at the original cif from RCSB and adding in the _entity_poly loop but wondering if there is a less janky approach.
I also had trouble with files created using BioPython or PyMol.
I found that Maxit worked in my case (I downloaded source code for
maxit-v11.300)
Maxit works for me as well (nothing else worked).
You can also use this service which runs Maxit under the hood: https://mmcif.pdbj.org/converter/index.php?l=en