openfold icon indicating copy to clipboard operation
openfold copied to clipboard

Feature request: retain model ensembling as an option, for molecular replacement usage purposes

Open lucajovine opened this issue 2 years ago • 8 comments

Hello,

Although it is true that in most cases the 5/25 models generated by AF2 monomer/multimer are very similar to each other, we consistently find that - when using these models (or pieces thereof) for molecular replacement - a model ensemble is clearly superior to any individual model (including the top ranked one).

An example of this is this work that appeared online just yesterday, were we performed molecular replacement using ensembles of models whose relative RMSD ranged from 0.7 to 1.9 Å.

Keeping this observation in mind, and regardless of whether DeepMind retires this functionality in the future or not, I think it would be very useful for crystallographers if this capability was reinstated into the OF code, as a non-default option (number_of_models = # or something like that).

Thank you!

lucajovine avatar Jul 07 '22 08:07 lucajovine

We do support multiple-model inference. Or are you referring to "ensembling" as discussed in the AlphaFold supplement?

gahdritz avatar Jul 08 '22 21:07 gahdritz

No, sorry, I meant multiple-model inference (I also got confused as we often use the term ensembles for MR model sets). But how does one then tell OF how many models to make, if they want more than 1?

lucajovine avatar Jul 11 '22 12:07 lucajovine

Just provide a comma-separated list of model parameters. E.g.

--jax_param_path <path_1>,<path_2> --openfold_checkpoint_path <path_3>,<path_4>

Right now the script only supports one config_preset at a time, but I'm trying to figure out a sensible way to allow users to specify per-checkpoint config presets.

gahdritz avatar Jul 11 '22 12:07 gahdritz

Sorry, I think I did not explain myself well. By models I meant output PDB models, i.e. the equivalent of "ranked_0.pdb... ranked_4.pdb" of AF2 (monomer).

lucajovine avatar Jul 11 '22 12:07 lucajovine

AF2 generates the different PDB files using different sets of model parameters (of which it runs 5 by default). This functionality can be replicated (barring the config_preset issue) by just specifying 5 different sets of model parameters as I did above. Maybe I'm still misunderstanding you though.

gahdritz avatar Jul 11 '22 13:07 gahdritz

Thanks for the explanation, it's nice that one can also generate PDBs with AF2 and OF models at the same time (based on 'Exactly one of --openfold_checkpoint_path or --jax_param_path must be specified to run the inference script." I thought it was either one or the other). Is there any way to also rank the resulting PBDs by chain pLDDT (as AF2 does)?

lucajovine avatar Jul 15 '22 13:07 lucajovine

That "Exactly" message is outdated. I'll change it now. We don't have pLDDT ranking built in ATM, but I can add it reasonably soon.

gahdritz avatar Jul 15 '22 14:07 gahdritz

Great, thank you!

lucajovine avatar Jul 15 '22 14:07 lucajovine