openfold
openfold copied to clipboard
Feature request: retain model ensembling as an option, for molecular replacement usage purposes
Hello,
Although it is true that in most cases the 5/25 models generated by AF2 monomer/multimer are very similar to each other, we consistently find that - when using these models (or pieces thereof) for molecular replacement - a model ensemble is clearly superior to any individual model (including the top ranked one).
An example of this is this work that appeared online just yesterday, were we performed molecular replacement using ensembles of models whose relative RMSD ranged from 0.7 to 1.9 Å.
Keeping this observation in mind, and regardless of whether DeepMind retires this functionality in the future or not, I think it would be very useful for crystallographers if this capability was reinstated into the OF code, as a non-default option (number_of_models = # or something like that).
Thank you!
We do support multiple-model inference. Or are you referring to "ensembling" as discussed in the AlphaFold supplement?
No, sorry, I meant multiple-model inference (I also got confused as we often use the term ensembles for MR model sets). But how does one then tell OF how many models to make, if they want more than 1?
Just provide a comma-separated list of model parameters. E.g.
--jax_param_path <path_1>,<path_2> --openfold_checkpoint_path <path_3>,<path_4>
Right now the script only supports one config_preset
at a time, but I'm trying to figure out a sensible way to allow users to specify per-checkpoint config presets.
Sorry, I think I did not explain myself well. By models I meant output PDB models, i.e. the equivalent of "ranked_0.pdb... ranked_4.pdb" of AF2 (monomer).
AF2 generates the different PDB files using different sets of model parameters (of which it runs 5 by default). This functionality can be replicated (barring the config_preset
issue) by just specifying 5 different sets of model parameters as I did above. Maybe I'm still misunderstanding you though.
Thanks for the explanation, it's nice that one can also generate PDBs with AF2 and OF models at the same time (based on 'Exactly one of --openfold_checkpoint_path or --jax_param_path must be specified to run the inference script." I thought it was either one or the other). Is there any way to also rank the resulting PBDs by chain pLDDT (as AF2 does)?
That "Exactly" message is outdated. I'll change it now. We don't have pLDDT ranking built in ATM, but I can add it reasonably soon.
Great, thank you!