template cif contains insertion code
Expected Behavior
I would like to reuse a template folder for multiple ColabFold runs. So I run ColabFold first on the following input sequence:
>seq EIVLTQSPGTQSLSPGERATLSCRASQSVGNNKLAWYQQRPGQAPRLLIYGASSRPSGVADRFSGSGSGTDFTLTISRLEPEDFAVYYCQQYGQSLSTFGQGTKVEVKRTV:NWFDITNWLWYIK:VQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGDGDLGKPIGAFAHWGQGTLVTVSS
If finds many templates:
2023-09-25 06:30:30,521 Sequence 0 found templates: ['6xe1_L', '7lk9_B', '7s5r_C', '7b0b_L', '7x29_G', '5i1k_L', '6ghg_B', '7kql_L', '7tbf_L', '6ol5_L', '7d0c_F', '6wir_B', '5xmh_L', '6o25_J', '6o29_B', '5gmq_C', '4ypg_L', '4xcy_I', '7u0d_P', '5w1k_N'] 2023-09-25 06:30:35,871 Sequence 1 found templates: ['5cil_P', '5x08_P', '7ekk_P', '4wy7_P', '4xbe_P', '5cin_P', '6o3j_G', '6o42_G', '6o42_I', '7ekb_P', '2fx7_P', '4xaw_P', '6o3g_G', '6o3g_I', '6o3g_Q', '6o3g_S', '6o3j_I', '6o3l_D', '6o3l_E', '6snc_P'] 2023-09-25 06:30:46,110 Sequence 2 found templates: ['5cil_H', '4llv_C', '4xce_C', '4xcn_A', '4ngh_H', '4xce_A', '4xce_H', '4xbp_A', '4xc3_H', '4xcy_H', '7bpk_H', '4xbp_C', '4xbp_E', '7f7e_C', '7bep_D', '5e08_H', '5gzn_C', '5gzn_H', '7czt_I', '6ehw_B']
I pull all the .cif files from seq_env/templates_/.cif into one new folder called "mytemplates"
I then run another ColabFold by pointing --custom-template-path at mytemplates and expect it will work. ColabFold failed.
Current Behavior
When using mytemplates as the --custom-template-path, ColabFold complains (I added the problematic template name to the error message):
mk_hhsearch_db raise ValueError( ValueError: PDB **mytemplates/7u0d.cif** contains an insertion code at chain O and residue index 52. These are not supported.
Why 7u0d.cif is good on the first run, but it is not acceptable when we use it as a custom template?
Since I need to predict multiple sequences with small mutations, I would like to reuse the templates without making a query against MSA server each time.
Thanks!
Steps to Reproduce (for bugs)
Please make sure to reproduce the issue after a "Factory Reset" in Colab.
If running locally ypdate you local installation colabfold_batch to the newest version.
Please provide your input if you can share it.
ColabFold Output (for bugs)
Please make sure to also post the complete ColabFold output. You can use gist.github.com for large output.
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
- Git commit used
- If you run it on a local system. Please add the server specifications
- Operating system and version:
I wonder if this insertion code checking is really necessary. If I comment it out, ColabFold seems to work. It would be great to get your expert's opinion. Thanks.
in batch.py, it works if I simply comment out five lines below:
for chain in model:
amino_acid_res = []
for res in chain:
#if res.id[2] != " ":
# raise ValueError(
# f"PDB contains an insertion code at chain {chain.id} and residue "
# f"index {res.id[1]}. These are not supported."
# )
amino_acid_res.append(
residue_constants.restype_3to1.get(res.resname, "X")
)