RFdiffusion icon indicating copy to clipboard operation
RFdiffusion copied to clipboard

Incorrect output pdb format after binder design on Colab notebook

Open Emalude opened this issue 6 months ago • 4 comments

Hi

I'm using the Colab implementation of the generative pipeline. I input a protein-protein complex pdb file (chains E and A), asking to design a binder to chain E on defined hotspots. The RFDiffusion pdb outputs seem to have a single chain composed of the old chain E and the correctly designed new binder. In addition, the indexing seems to have an issue such thah if I use PyMol or UGENE to highlight the first few amminoacids, it select the first few in chain E and the binder at the same time, like they are overlapping in some way, while if I select the last few, nothing is selected (see attached screenshot where I selected the first 15 amminoacids on the only chain existing).

Might this be caused by the way the 'contigs' parameter is passed? If I input in the notebook something like E333-526:50-70, the contigs map generated is 'contigmap.contigs=[E333-526 51-51]' as printed before RFDiffusion is run. The same happens if I try to pass E333-526/0:50-70. I would expect, from the example scripts, to need something like 'contigmap.contigs=[E333-526/0 51-51]' or am I missing something?

Thank you.

Image

Emalude avatar Jul 08 '25 16:07 Emalude

Hello,

The issue with the contigmap.contigs option is likely not causing what you mentioned in the first paragraph. This just means that the binders being generated will only be 51 amino acids in length, instead of varying lengths.

Are you using Sergey O.'s colab notebook? If you are, make sure you are using the 'main' branch version (read the heading at the top of the notebook - if it doesn't mention the main branch you are in the most up-to-date version).

As for your issues with PyMol, without having any output files this will be difficult to address.

rclune avatar Jul 14 '25 18:07 rclune

Hi

Apologies for the late reply, I paused this personal project and getting back to it only now.

I'm attaching the script I used and one of the output pdb.

Basically, I believe the issue is caused by the lack of the TER line in the pdb file indicating the end of the chain. I found this issue both in the Colab implementation and in the normal setup.

Script: ../scripts/run_inference.py inference.output_prefix=../../sars-cov2-mini-protein-binder/outputs/RFdiffusion/sars-cov2-mini-prot inference.input_pdb=../../sars-cov2-mini-protein-binder/input/6m0j.pdb 'contigmap.contigs=[E333-526/0 50-70]' 'ppi.hotspot_res=[E487,E493,E498,E500,E502,E505]' inference.num_designs=10 denoiser.noise_scale_ca=0 denoiser.noise_scale_f>

File: mini-prot_1.pdb

Emalude avatar Oct 13 '25 12:10 Emalude

This is what I see when I load the file you sent into my PyMol setup and select the designed residues:Image

It isn't selecting residues in the chain and binder at the same time - are you still having that issue?

rclune avatar Oct 13 '25 16:10 rclune

Ah, that's weird but now I can select the chain with PyMol, maybe I was doing something wrong. However, Ugene is still not able to correctly identify the 2 chains, treating them as a single one with indexing issue. Not a major issue at this point, I've just created a script to add a TER line in the pdb file at the end of every chain so I can correctly visualising them. Probably more of a Ugene issue as I can't see why the TER line is absolutely necessary to identify a chain end when it should be obvious by the letter change itself. Thanks!

Emalude avatar Oct 14 '25 09:10 Emalude