Are there any options / tricks to better enforce hotspot binding points?

Open ackbar03 opened this issue 1 year ago • 1 comments

Hi,

I am trying to generate binders to a receptor, but the generated binders are often not in the pocket specified by the target hotspots.

I've tried different variations, e.g. specifying more / less hotspots, but the binding location generated still seems pretty random.

Is there any setting or part in the code I can change which may help better enforce the binding location?

Thanks

Jan 15 '24 07:01 ackbar03

I have been wishing the hotspots could be tweaked. I have not managed without going down the macaroni-art -> partial diffusion route. But I thought I best share my notes on avoiding a sticky part. I have not got anywhere, but it help someone!

RFdiffusion is not too fussed by what the AA type is — changing the sticky part to something alien like CYS does nothing to prevent the stickiness
RFdiffusion when cornered by an alien randomly indexed wall of tryptophans will make jumps through it or raise an error (np.linalg.svd did not converge)
Shearing the no-go zone helps sometimes but makes everything a pain and has the same issue as the above
Giving a hotspot twice or more should do nothing as it gets converted into a one-hot encoding —but I heard someone say this myth so wanted to debunk it although I have not bothered tested it

Looking at the source code, the hotspots work only with receptor-binder design modes (whatever they are called), otherwise receptor_con_ref_pdb_idx happens. However, the hotspots need to come from the non-designed receptor, so no hacks possible.

In terms of possibly tweaking the code, the hotspot_idx0s, end up in the t1d embedding (I am guessing it stands for tensor 1D). I think it's $\mathbf{T} \in {0,1}^{1 \times l \times 28}$ not $\mathbf{T} \in \mathbb{R}^{1 \times l \times 28}$ —I've not checked. The 23rd channel of this 3rd order tensor is the hotspot one-hot (cf https://github.com/RosettaCommons/RFdiffusion/blob/main/rfdiffusion/inference/model_runners.py#L422). This channel gets aggregated here by argmax: https://github.com/RosettaCommons/RFdiffusion/blob/main/rfdiffusion/inference/model_runners.py#L505C19-L505C60 If it's $\in \mathbb{R}$ then one could add a weight and the model training be damned —argmax returns the index after all 🙃

There is also a d_t1d scalar, which I have not figure out what it does but isn't a weight for t1d. I changed preprocess.d_t1d=40 argument of run_inference.py and it yelled at me, so I didn't look further.

The guided potentials of the gradient don't touch this channel but are set up nicely, so theoretically one could write a loss-function–like thinggy to act like an AmbiguousConstraint (to use Rosetta terminology) where lowest distance in the adjacency plane in t2d between hotspot and design gets returned, but there'd be a world of hurt as it might be discontinuous... And would require a lot of effort. At that point, in for a penny, in for a pound, one could go all in an implement a cartesian no-go zone penalty (as the code to rototranslate and jump between xyz and t2d is there already) 🤷

Other Qs that ask re hotspots: #321 #224

Mar 12 '25 11:03 matteoferla