[user story] Large-scale benchmark set curation and continuous evaluation from BindingDB

Open jchodera opened this issue 3 years ago • 1 comments

In broad terms, what are you trying to do? BindingDB has a very large number (>1M) of binding free affinity measurements that we can use as a source of automating the curation of large-scale benchmark sets and evaluation of different methodologies and force fields for multiple members of each target class. In fact, BindingDB has a very large protein-ligand validation set where sets of related ligands that share a common scaffold and have at least one PDB structure that could be used for modeling compounds for relative and absolute free energy calculations. New data is also being curated and added to BindingDB all the time, enabling continuous evaluation of force fields (similar to CELPP).

We have the opportunity to perform very large-scale benchmarks of methods and force fields for multiple members of each target class to continually evaluate these combinations and publicly post the results in a dashboard that can help practitioners use the best methods and models for their target classes of interest.

How do you believe using this project would help you to do this?

The automation infrastructure we develop for other projects could be repurposed for this activity with minimal additional effort, using spare capacity .

What problems do you anticipate with using this project to achieve the above?

There are likely many unexpected issues we will encounter in trying to apply the workflows we develop to new targets. Even if the pipeline doesn't fail during processing, performance/accuracy may be poor. That's OK---we just need them to fail gracefully and capture sufficient information about the failure to inform the method and force field developers are able to use this information to systematically improve methdologies and force fields.

Feb 22 '22 17:02 jchodera

Raw notes from story review, shared here for visibility:

a benchmarking campaign of its own, perhaps; doesn't necessarily need to be committed to by e.g. OpenFF or the Chodera Lab, but could be taken on by a third party if desirable
even if it isn't a major software task, would need a project owner to advance and maintain the effort, in particular dealing with inconsistencies and problem cases
- with failure-first, those involved could at least be set up for success
could be a candidate for a step beyond protein-ligand-benchmarks if it turns out scale isn't an issue
goes beyond being a benchmarking approach and is closer to an application approach (as defined in http://arxiv.org/abs/2105.06222)

Mar 05 '22 00:03 dotsdl