boltz Virtual Screening

Is there an optimization that could be done to speed up folding a series of ligands against the same protein sequence? Can the protein features be cached to increase the throughput? I am happy to submit a PR with a little guidance on how to proceed. Thanks.

Jun 03 '25 16:06 tkramer-motion

I've run both Boltz and AlphaFold 3 predictions of 1500 FDA approved compounds to single virus protein targets. It can run pretty fast, 4 hours total on an Nvidia 4090 or 20 hours on a Mac M2 Ultra. I used a script to generate the 1500 input files and I reuse the same MSA for the protein target. Here are some web pages describing the process for AlphaFold 3 and also a comparison between between the Boltz and AF3 results that shows their iPTM scores for the protein-ligand interfaces are not well correlated.

https://www.rbvi.ucsf.edu/chimerax/data/af3-drugs-dec2024/af3_drugs.html

https://www.rbvi.ucsf.edu/chimerax/data/macromethods-jan2025/af3_drugs.html

https://www.rbvi.ucsf.edu/chimerax/data/boltz_af3_compare_feb2025/drug_compare.html

I must have the script that generated the Boltz input files but I don't think I have posted that on any web page, probably because after seeing the lack of correlation between AF3 and Boltz confidence scores I suspect neither Boltz nor AF3 are very useful for screening ligand binding.

Jun 03 '25 17:06 tomgoddard

@tomgoddard Hello, I agree with your test results. In my recent designs, I've also found that the evaluation metrics provided by Boltz-2 have a poor correlation with those of AF2/AF3. However, given Boltz-2's completely open-source nature and easier deployment compared to AF3, I sometimes have to consider using Boltz-2 instead of Google-DeepMind's official AF. Therefore, I'd like to ask your opinion on the reliability of Boltz-2's evaluation metrics, especially for virtual screening of small molecule-protein complexes or protein-protein complexes. How might these Boltz-2 evaluation metrics be utilized to achieve better screening results? I look forward to your insights and would be very grateful!

Oct 30 '25 12:10 Mistletoe-git

I think this preprint gives a good perspective. It says that ligand binding predictions are good if the training data contained similar ligands in similar pockets but bad otherwise. In other words, the quality of the predictions is directly correlated to whether there is something similar in the training data. That means that predicting novel ligands or in novel pockets is not likely to give good results.

Have protein-ligand co-folding methods moved beyond memorisation?

Oct 30 '25 16:10 tomgoddard