Are there any options / tricks to better enforce hotspot binding points?
Hi,
I am trying to generate binders to a receptor, but the generated binders are often not in the pocket specified by the target hotspots.
I've tried different variations, e.g. specifying more / less hotspots, but the binding location generated still seems pretty random.
Is there any setting or part in the code I can change which may help better enforce the binding location?
Thanks
I have been wishing the hotspots could be tweaked. I have not managed without going down the macaroni-art -> partial diffusion route. But I thought I best share my notes on avoiding a sticky part. I have not got anywhere, but it help someone!
- RFdiffusion is not too fussed by what the AA type is — changing the sticky part to something alien like CYS does nothing to prevent the stickiness
- RFdiffusion when cornered by an alien randomly indexed wall of tryptophans will make jumps through it or raise an error (np.linalg.svd did not converge)
- Shearing the no-go zone helps sometimes but makes everything a pain and has the same issue as the above
- Giving a hotspot twice or more should do nothing as it gets converted into a one-hot encoding —but I heard someone say this myth so wanted to debunk it although I have not bothered tested it
Looking at the source code, the hotspots work only with receptor-binder design modes (whatever they are called), otherwise receptor_con_ref_pdb_idx happens. However, the hotspots need to come from the non-designed receptor, so no hacks possible.
In terms of possibly tweaking the code, the hotspot_idx0s, end up in the t1d embedding (I am guessing it stands for tensor 1D). I think it's $\mathbf{T} \in {0,1}^{1 \times l \times 28}$ not $\mathbf{T} \in \mathbb{R}^{1 \times l \times 28}$ —I've not checked. The 23rd channel of this 3rd order tensor is the hotspot one-hot (cf https://github.com/RosettaCommons/RFdiffusion/blob/main/rfdiffusion/inference/model_runners.py#L422).
This channel gets aggregated here by argmax:
https://github.com/RosettaCommons/RFdiffusion/blob/main/rfdiffusion/inference/model_runners.py#L505C19-L505C60
If it's $\in \mathbb{R}$ then one could add a weight and the model training be damned —argmax returns the index after all 🙃
There is also a d_t1d scalar, which I have not figure out what it does but isn't a weight for t1d. I changed preprocess.d_t1d=40 argument of run_inference.py and it yelled at me, so I didn't look further.
The guided potentials of the gradient don't touch this channel but are set up nicely, so theoretically one could write a loss-function–like thinggy to act like an AmbiguousConstraint (to use Rosetta terminology) where lowest distance in the adjacency plane in t2d between hotspot and design gets returned, but there'd be a world of hurt as it might be discontinuous... And would require a lot of effort. At that point, in for a penny, in for a pound, one could go all in an implement a cartesian no-go zone penalty (as the code to rototranslate and jump between xyz and t2d is there already) 🤷
Other Qs that ask re hotspots: #321 #224