Reopen #7: pLDDT not writing to PDB
I am having the same issue as was previously reported in #7, where all pLDDT values are being stored as 1.00. Including print(protein.plddt) provides the per-residue tensor values for pLDDT (values less than 1), so I know the model is generating them. I am running code that is very similar to #7.
Am I missing something obvious or has that previous bug popped back up?
I think this issue is in handling of the protein_complex into .to_pdb. If I print the output of
protein_complex= protein.to_protein_complex()
for chain in protein_complex.chain_iter():
print(chain.confidence)
I get an array of 1's (which is what is being output into the b-factor column of my PDBs). I've managed to hack my way out of this by running protein.to_pdb_string() and then writing the PDB file manually:
scaled_plddt = protein.plddt * 100
protein_chain = protein.to_protein_chain()
protein_chain.confidence = scaled_plddt.detach().cpu().numpy()
pdb_string = protein_chain.to_pdb_string()
with open(out_pdb, "w") as f:
f.write(pdb_string)
Let me know if I'm just missing something silly, or if this is a bona fide bug
Hello, thank you so much for developing and maintaining this tool! I am facing the same issue here where the pLDDT scores are not being written to the PDB file. Currently, @LandonGetz's hack works for me (thank you very much!) but am wondering if there is a fix in the works for this?
cc @imathur1 , I think this is a real bug?
Similar to this issue https://github.com/evolutionaryscale/esm/issues/247, I believe this PR https://github.com/evolutionaryscale/esm/pull/276/files#diff-253d59232c58fceb8e86ac22dbcd860107afb9c3d5c4518d9b08ce997a9aab7e addresses the problem of all 1s being written out to the PDB file in the b factor column.
I ran this snippet
model: ESM3InferenceClient = ESM3.from_pretrained("esm3-sm-open-v1").to("cuda")
prompt = "HERPYACP_________________________________________________________________________________"
protein = ESMProtein(sequence=prompt)
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.5))
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("protein.pdb") # type: ignore
from https://github.com/evolutionaryscale/esm/issues/7 and I see the PLDDT values are not 1 in the PDB file:
ATOM 1 N HIS A 1 -25.586 9.408 10.481 1.00 0.89 N
ATOM 2 CA HIS A 1 -24.918 8.158 10.135 1.00 0.89 C
ATOM 3 C HIS A 1 -24.927 7.187 11.311 1.00 0.89 C
ATOM 4 N GLU A 2 -25.360 6.203 11.210 1.00 0.94 N
ATOM 5 CA GLU A 2 -25.428 5.181 12.249 1.00 0.94 C
ATOM 6 C GLU A 2 -24.157 4.338 12.274 1.00 0.94 C
ATOM 7 N ARG A 3 -23.591 3.877 13.252 1.00 0.96 N
ATOM 8 CA ARG A 3 -22.415 3.034 13.439 1.00 0.96 C
ATOM 9 C ARG A 3 -22.749 1.803 14.275 1.00 0.96 C
ATOM 10 N PRO A 4 -23.260 0.926 13.716 1.00 0.96 N
ATOM 11 CA PRO A 4 -23.783 -0.184 14.505 1.00 0.96 C
ATOM 12 C PRO A 4 -22.705 -1.235 14.755 1.00 0.96 C
ATOM 13 N TYR A 5 -21.536 -1.133 14.074 1.00 0.97 N
ATOM 14 CA TYR A 5 -20.496 -2.128 14.313 1.00 0.97 C
ATOM 15 C TYR A 5 -19.547 -1.677 15.417 1.00 0.97 C
ATOM 16 N ALA A 6 -19.575 -2.323 16.450 1.00 0.98 N
ATOM 17 CA ALA A 6 -18.739 -1.976 17.595 1.00 0.98 C
Let me know if an issue still persists!