boltz Affinity output explanation request

Hi,

Thanks the team for developing Boltz2, especially making it open source. I know the community is excited to use it and more importantly validate it against specific cases.

I'm just curious what exactly are the different pairs of values generated via the affinity.json as it's not very clear in the predictions.md ?

"affinity_pred_value": 0.8367,             # Predicted binding affinity from the ensemble model
"affinity_probability_binary": 0.8425,     # Predicted binding likelihood from the ensemble model
"affinity_pred_value1": 0.8225,            # Predicted binding affinity from the first model of the ensemble
"affinity_probability_binary1": 0.0,       # Predicted binding likelihood from the first model in the ensemble
"affinity_pred_value2": 0.8225,            # Predicted binding affinity from the second model of the ensemble
"affinity_probability_binary2": 0.8402,    # Predicted binding likelihood from the second model in the ensemble

Currently I am running validations of Boltz2 prediction against my experimental data.

The following is an example of YAML input file. The intuition is to supply MSA generated via alphafold3 and structure template to save time, and merely use Boltz2 as a bind or no bind predictor.

version: 1
sequences:
- protein:
    id: [A]
    sequence: MDGDNETMVAEFLLLGLSGKSEQEEVVFGMFLGMYLVTISGNLLIILAISCDPHLHTPMYFFLANLSSVDICFSSVTVPKALVNHVLGSKSISYTECMIQIYFFITFINMDGFLLSVMAYDRYVAICHPLHYTMMMRSRLCVLLVAISWVITNLHALLHTLLMVRLTFCSHNAVHHFFCDPYPILKLSCSDTFINDLMVFTVGGVIFLTPFSCIVVSYVYIFSKVLKIPSARGIRKALSTCGSHLTVVSLFYGAILGVYMRPSSSYSLQDTVATVIFTVVTPLVNPFIYSLRNQDMKGALRKIMLRS
    msa: ~/or1ad1_ga/or1ad1.a3m
- protein:
    id: [B]
    sequence: GGSLEVLFQGPSGNSKTEDQRNEEKAQREANKKIEKQLQKDKQVYRATHRLLLLGADNSGKSTIVKQMRILHGGSGGSGGTSGIFETKFQVDKVNFHMFDVGGQRDERRKWIQCFNDVTAIIFVVDSSDYNRLQEALNLFKSIWNNRWLRTISVILFLNKQDLLAEKVLAGKSKIEDYFPEFARYTTPEDATPEPGEDPRVTRAKYFIRDEFLRISTASGDGRHYCYPHFTCAVDTENARRIFNDCRDIIQRMHLRQYELL
    msa: ~/Ga.a3m
- ligand:
    id: [C]
    smiles: 'CC(=O)C1=CC=CC=C1'
templates:
- cif: ~/or1ad1_ga/or1ad1_ga_model.cif
  chain_id: [A, B]
  template_id: [A, B]

Affinity output:

{
    "affinity_pred_value": 2.400198459625244,
    "affinity_probability_binary": 0.16011813282966614,
    "affinity_pred_value1": 2.5400466918945312,
    "affinity_probability_binary1": 0.17134127020835876,
    "affinity_pred_value2": 2.260350465774536,
    "affinity_probability_binary2": 0.1488949954509735
}

In short, I am interested in seeing how does the ligand (Chain C) binds with my protein of interest (Chain A) ? Chain B is merely a stabilizing protein to stabilize Chain A structure. In this specific test case the ligand is known to activate the protein quite well, and yet with a low affinity binary. And when ran across a set of protein and ligand pairs to validate experimental data, there seems to not be a very consistent trend.

I'm wondering more specifically what exactly is the Boltz-2 output used to generate the "screen score" in the study (Fig. 8) ? quote from the publication In these screens, we use a combination of the Boltz-2 predicted binding likelihood and affinity as a screen score for small molecules ["Boltz-2:Towards Accurate and Efficient Binding Affinity Prediction"].

Jun 29 '25 20:06 Justice-Lu

Same question :( Now I want to review the paper to find the answer. Could you please tell me when you find out what it means? Thank you !

Jul 01 '25 07:07 yangjinhao1234

Now I know what it means! Just check the technical report B.5 affinity module part. It said they train 2 model to predict affinity, these 2 model use different Pairformer layers, λ and training samples. Then they have a function to aggregate these 2 result. In my case, it is average. So they want us use the first group value I think.

Jul 02 '25 08:07 yangjinhao1234

I've read about the two different model used to predict affinity too. I think that's why the comment block wrote ... from first / second model of the ensembl.

But it's still rather unclear what the "Screen score" is from Boltz output that they used to benchmark against ABFE.

Jul 02 '25 13:07 Justice-Lu

I think it's clear, the first group prediction, it's an aggregated value. They use it as the final score. Then you can rank all the molecular, and calculate the EF score.

Jul 03 '25 01:07 yangjinhao1234

I think it's clear, the first group prediction, it's an aggregated value. They use it as the final score. Then you can rank all the molecular, and calculate the EF score.

So we need to use the first "affinity_pred_value" then?

Aug 08 '25 03:08 MiaJY-Yang

I think it's clear, the first group prediction, it's an aggregated value. They use it as the final score. Then you can rank all the molecular, and calculate the EF score.

So we need to use the first "affinity_pred_value" then?

I use it for internal case. The result is not bad, so i think it works.

Aug 10 '25 03:08 yangjinhao1234