openfold
openfold copied to clipboard
openfold/np/protein.py:to_pdb(): chain_tag sometimes not set
I found what appears to be a rare case (once in millions of proteins) where the loop in to_pdb() sometimes fails to set chain_tag before closing the chain, causing an error:
Traceback (most recent call last):
File "/pscratch/sd/f/flowers/esm/scripts/esmfold_inference.py", line 186, in <module>
pdbs = model.output_to_pdb(output)
File "/pscratch/sd/f/flowers/miniconda3/lib/python3.9/site-packages/esm/esmfold/v1/esmfold.py", line 303, in output_to_pdb
return output_to_pdb(output)
File "/pscratch/sd/f/flowers/miniconda3/lib/python3.9/site-packages/esm/esmfold/v1/misc.py", line 115, in output_to_pdb
pdbs.append(to_pdb(pred))
File "/pscratch/sd/f/flowers/miniconda3/lib/python3.9/site-packages/openfold/np/protein.py", line 373, in to_pdb
f"{chain_tag:>1}{residue_index[i]:>4}"
UnboundLocalError: local variable 'chain_tag' referenced before assignment
It's possible esmfold was passing bad parameters, but adding a check to set chain_tag to "A" if not set allowed the code to run without errors.
The protein in question was
MAPVKVFGPAKSRNVARVLVCLEEVGAEYEVVDMDLKALEHKSPEHLARNPFGQTPAFQDGDLLLFESRAISRYVLRKYKTNQVDLLREGNLKEAAMVDVWTEVDAHTYNPAISPVVYECLINPLVLGIPTNQKVVDESLEKLKKALEVYEAHLSKDKYLAGDFMSFADINHFPHTCSFMAAPHAVLFDSYPYVKAWWERLMARPSIKKLSASLAPPKA*
And the tail of the output pdb (when run with the modified code) was:
ATOM 1736 CB ALA A 219 -14.556 -18.156 -6.584 1.00 83.46 C
ATOM 1737 O ALA A 219 -16.753 -18.815 -4.504 1.00 84.66 O
TER 1738 UNK A 220 PARENT N/A TER 1739 ALA A 1 END
Hm peculiar. Could you share the modification you made?
--- protein.py.orig 2023-01-28 22:31:40.566683304 -0800
+++ protein.py 2023-01-28 22:31:23.543314000 -0800
@@ -367,8 +367,10 @@
if(should_terminate):
# Close the chain.
chain_end = "TER"
+ if atom_index == 1:
+ chain_tag = "A"
chain_termination_line = (
f"{chain_end:<6}{atom_index:>5} "
f"{res_1to3(aatype[i]):>3} "
f"{chain_tag:>1}{residue_index[i]:>4}"
)
Just to prevent chain_tag from being undefined right there. I mean, you could for it being undefined, but it'll only happen it atom_index is 1, so.
It still looks like it's outputting an extra TER and PARENT line. I'll look into this.