foundry Unexpected sampling results

Description

After installation, I ran batch sampling using the following command:

rfd3 design out_dir=RF3_PD \
 inputs=./PD_L1.json \
 skip_existing=False \
 dump_trajectories=False \
 align_trajectory_structures=True \
 ckpt_path=./checkpoints/rfd3_latest.ckpt \
 cleanup_virtual_atoms=True \
 diffusion_batch_size=4 \
 n_batches=25 \
 output_full_json=False

The PD_L1.json configuration file uses the example content from the doc:

{
    "pdl1": {
        "dialect": 2,
        "infer_ori_strategy": "hotspots",
        "input": "./RFD3/models/rfd3/docs/input_pdbs/5o45_cropped.pdb",
        "contig": "50-120,/0,A17-131",
        "select_hotspots": {
            "A56": "CG,OH",
            "A115": "CG,SD",
            "A123": "CD2,OH"
        }
    }
}

Issue

When using AF3 to screen the 100 generated binders, almost none of them pass the interaction quality filter (min PAE interaction < 1.5). Is there anything incorrect in my sampling configuration?

Dec 06 '25 16:12 Ma-Yiming

Did you use an inverse folding tool such as ProteinMPNN? At present, the sequences designed by RFD3 are unlikely to pass refolding filters, so we recommend using an auxiliary sequence design method on the RFD3 outputs.

Dec 07 '25 00:12 RafiBrent

Did you use an inverse folding tool such as ProteinMPNN? At present, the sequences designed by RFD3 are unlikely to pass refolding filters, so we recommend using an auxiliary sequence design method on the RFD3 outputs.

Thank you for your reply!

I used ProteinMPNN as the inverse folding tool. Specifically, I used vanilla_model_weights as the checkpoint and sampling_temp as 0.0001.

Referring to the process in ProteinMPNN's ./examples/submit_example_2.sh:

../helper_scripts/parse_multiple_chains.py ...
../helper_scripts/assign_fixed_chains.py ... --chain_list "A" ... This fixes all chains except chain A to design the sequence of chain A.

Is this usage reasonable for RFD3?

Dec 07 '25 06:12 Ma-Yiming

Thanks for the detailed information! Those MPNN settings look reasonable. When running AF3, what settings did you use for MSAs/templating? I would recommend to not use MSAs and to use the structure of the target as a template (rather than the default homology search performed by AF3). Does that match the settings you used?

In terms of the failure mode you're observing, do the RFD3 output structures look reasonable, and do the AF3 refolded structures also look reasonable but different? Or does there seem to be a point in the pipeline that is obtaining clearly implausible results?

Dec 07 '25 19:12 RafiBrent

Thanks for the detailed information! Those MPNN settings look reasonable. When running AF3, what settings did you use for MSAs/templating? I would recommend to not use MSAs and to use the structure of the target as a template (rather than the default homology search performed by AF3). Does that match the settings you used?

In terms of the failure mode you're observing, do the RFD3 output structures look reasonable, and do the AF3 refolded structures also look reasonable but different? Or does there seem to be a point in the pipeline that is obtaining clearly implausible results?

Thanks for your reply! As you mentioned, when running AF3, I didn't use MSA and used the true structure CIF files of all chains except chain A as templates (for PD_L1, this means the structure of chain B was used as the AF3 input template).

I didn't seem to see any obviously unreasonable results throughout the evaluation process. The files below are some randomly selected results from my evaluation, including the RFD3 predicted structures, the MPNN FASTA files, the AF3 folded structures, and the confidence files. (sample_0 is the sequence generated by RFD3 itself, and sample_1 is the file corresponding to the sequence generated by MPNN.)

I found that although the structures generated by RFD3 are good in terms of the SC metric, they are not good in terms of the confidence metric predicted by AF3. For example, in model_1, sample_1 has a very good SC metric, but chain_pair_pae_min is 14.92.

PD_L1.zip

Dec 08 '25 07:12 Ma-Yiming

Thanks for flagging this. We looked into it further and it is because in our default settings, we had unintentionally been setting is_non_loopy to True. We think this is especially important for PPI design because it biases the model towards more structured binding interfaces (that are more well-predicted by AF3). I will update the example PPI json to include that feature. Please let me know if this resolves your issue and thanks again for flagging this issue! We are also investigating if this affects other tasks such as enzyme design and will follow up soon!

Dec 10 '25 08:12 RafiBrent