openfold icon indicating copy to clipboard operation
openfold copied to clipboard

openfold/np/protein.py:to_pdb(): chain_tag sometimes not set

Open flowers9 opened this issue 2 years ago • 3 comments

I found what appears to be a rare case (once in millions of proteins) where the loop in to_pdb() sometimes fails to set chain_tag before closing the chain, causing an error:

Traceback (most recent call last):
  File "/pscratch/sd/f/flowers/esm/scripts/esmfold_inference.py", line 186, in <module>
    pdbs = model.output_to_pdb(output)
  File "/pscratch/sd/f/flowers/miniconda3/lib/python3.9/site-packages/esm/esmfold/v1/esmfold.py", line 303, in output_to_pdb
    return output_to_pdb(output)
  File "/pscratch/sd/f/flowers/miniconda3/lib/python3.9/site-packages/esm/esmfold/v1/misc.py", line 115, in output_to_pdb
    pdbs.append(to_pdb(pred))
  File "/pscratch/sd/f/flowers/miniconda3/lib/python3.9/site-packages/openfold/np/protein.py", line 373, in to_pdb
    f"{chain_tag:>1}{residue_index[i]:>4}"
UnboundLocalError: local variable 'chain_tag' referenced before assignment

It's possible esmfold was passing bad parameters, but adding a check to set chain_tag to "A" if not set allowed the code to run without errors.

The protein in question was

MAPVKVFGPAKSRNVARVLVCLEEVGAEYEVVDMDLKALEHKSPEHLARNPFGQTPAFQDGDLLLFESRAISRYVLRKYKTNQVDLLREGNLKEAAMVDVWTEVDAHTYNPAISPVVYECLINPLVLGIPTNQKVVDESLEKLKKALEVYEAHLSKDKYLAGDFMSFADINHFPHTCSFMAAPHAVLFDSYPYVKAWWERLMARPSIKKLSASLAPPKA*

And the tail of the output pdb (when run with the modified code) was:

ATOM 1736 CB ALA A 219 -14.556 -18.156 -6.584 1.00 83.46 C
ATOM 1737 O ALA A 219 -16.753 -18.815 -4.504 1.00 84.66 O
TER 1738 UNK A 220 PARENT N/A TER 1739 ALA A 1 END

flowers9 avatar Dec 23 '22 05:12 flowers9

Hm peculiar. Could you share the modification you made?

gahdritz avatar Jan 29 '23 05:01 gahdritz

--- protein.py.orig	2023-01-28 22:31:40.566683304 -0800
+++ protein.py	2023-01-28 22:31:23.543314000 -0800
@@ -367,8 +367,10 @@
         if(should_terminate):
             # Close the chain.
             chain_end = "TER"
+            if atom_index == 1:
+                chain_tag = "A"
             chain_termination_line = (
                 f"{chain_end:<6}{atom_index:>5}      "
                 f"{res_1to3(aatype[i]):>3} "
                 f"{chain_tag:>1}{residue_index[i]:>4}"
            )

Just to prevent chain_tag from being undefined right there. I mean, you could for it being undefined, but it'll only happen it atom_index is 1, so.

flowers9 avatar Jan 29 '23 06:01 flowers9

It still looks like it's outputting an extra TER and PARENT line. I'll look into this.

gahdritz avatar Jan 29 '23 07:01 gahdritz