RFdiffusion icon indicating copy to clipboard operation
RFdiffusion copied to clipboard

About secondary structure and block adjacency tensors in the newly published snake venom toxin binder design paper

Open RodenLuo opened this issue 11 months ago • 6 comments

Hi,

In the Methods section of this paper, it is described that the secondary structure and block adjacency tensors are one-hot tensors.

an [L,4] secondary one-hot tensor (0 = α-helix, 1 = β-strand, 2 = loop and 3 = masked secondary structure identity) to indicate the secondary structure classification of each residue in the binder–target complex an [L,L,3] adjacency one-hot tensor (0 = non-adjacent, 1 = adjacent and 2 = masked adjacency) to indicate interacting partner residues for each residue in the binder–target complex

However, in the examples/target_folds example, they are not in one-hot encoding but label encoding. I also tried to use the provided script to generate these inputs, which are also in label encoding. Of note is that the generated secondary structure encoding is in floats rather than ints, as in the given example. Please see the output at the end.

I wonder if the paper is using a different version of RFdiffusion, also if one can add the running command script and the inputs for the case studies in this paper. I believe adding a reproducing guide for this paper would greatly benefit the research community. Many thanks!

>>> import torch
>>> target_ss_path = "target_folds/insulin_target_ss.pt"
>>> target_adj_path = "target_folds/insulin_target_adj.pt"
>>> target_ss = torch.load(target_ss_path)
>>> target_adj = torch.load(target_adj_path)
>>> 
>>> target_ss
tensor([2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 2, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2,
        2, 2, 2, 2, 2, 2])
>>> target_ss.shape
torch.Size([150])
>>> target_adj.shape
torch.Size([150, 150])
>>> target_adj
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])
>>> torch.unique(target_adj)
tensor([0., 1.])
>>> torch.unique(target_ss)
tensor([0, 1, 2])

### ------- script to generate fold conditioning inputs 
### ./helper_scripts/make_secstruc_adj.py --input_pdb ./examples/input_pdbs/2KL8.pdb --out_dir fold_conditioning_input_test
### -------

>>> target_ss_path = "/home/RFdiffusion/fold_conditioning_input_test/2KL8_ss.pt"
>>> target_adj_path = "/home/RFdiffusion/fold_conditioning_input_test/2KL8_adj.pt"
>>> target_ss = torch.load(target_ss_path)
>>> target_adj = torch.load(target_adj_path)
>>> target_ss
tensor([2., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 2., 2., 1., 1., 1., 1., 1.,
        1., 2., 2., 2., 2., 1., 1., 1., 1., 1., 1., 2., 2., 2., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 2., 2., 1., 1.,
        1., 1., 1., 1., 1., 2., 2.])
>>> target_ss.shape
torch.Size([79])
>>> target_adj.shape
torch.Size([79, 79])
>>> target_adj
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 1., 0., 0.],
        ...,
        [0., 1., 1.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])
>>> torch.unique(target_adj)
tensor([0., 1.])
>>> torch.unique(target_ss)
tensor([0., 1., 2.])

RodenLuo avatar Jan 21 '25 11:01 RodenLuo

If I understand the README and the Methods in the paper correctly, the difference is actually the following.

In the README, at the fold conditioning section with a target structure, the secondary structure and block adjacency tensors for the target and the scaffold are independently set. scaffoldguided.target_ss and scaffoldguided.target_adj are responsible for the target, while scaffoldguided.scaffold_dir has the ss and adj tensors for the scaffold. I checked in the given examples/ppi_scaffolds and found some scaffolds are shorter than the insulin_target.pdb, which means the scaffold does not include the target itself.

In the new paper, these tensors are for "binder–target complex".

Now, a few questions come in:

  1. How do we set the adjacency tensor values between the binder's amino acids and the target's?
  2. With such tensors generated, either one-hot encoding or label encoding, how do we feed them into the RFdiffusion model?

Many thanks for any help.

RodenLuo avatar Jan 22 '25 10:01 RodenLuo

Also interested in this. Would be nice to see a clear example of what was done in the paper.

I'm having trouble understanding how you can have a tensor of [L,4] specifying the secondary structure of each residue in the binder target complex. Unless this would be of size [Lbinder+Ltarget, 4] and the adjacency matrix would be [Lbinder, Ltarget, 3]?

The example in the repository uses the ppi_hotspot selection string, separate sets of tensors for the target and binder, but no tensors specifying the binder interface adjacency.

Dan-Burns avatar Feb 21 '25 23:02 Dan-Burns

I also want the authors or the contributors provide a script to perform this task, many thanks!

PKUfjh avatar Feb 24 '25 06:02 PKUfjh

Image

I think this diagram describe how the fold-conditioning information's is provided it's related to this [paper], so what I think is the input is overall Binder + Target Adjacency matrix, showing interactions between the binder & target for de-novo binder design. But still didn't get how these tensors are generated, and interactions are decided at what positions for de-novo binders.

satyabikash avatar Feb 24 '25 07:02 satyabikash

Hello, I'm also interested in this please. Any updates on how they generated these tensors please?

Ines-Elmufti avatar Sep 18 '25 09:09 Ines-Elmufti

Unfortunately the authors of this paper have moved on to other projects and thus do not regularly check the Issues on this GitHub repo. Has anyone tried reaching out to them? They are likely the only people who can satisfactorily answer these questions.

rclune avatar Sep 18 '25 17:09 rclune