perses icon indicating copy to clipboard operation
perses copied to clipboard

Improvements for automated benchmarking

Open jchodera opened this issue 3 years ago • 4 comments

There appear to be some issues with how the run_benchmarks.py script is written: If each script is supposed to handle a transformation, which may cause issues if this code runs concurrently for multiple transformations:

target_dir = targets_dict[target]['dir']
pdb_url = f"{base_repo_url}/raw/master/data/{target_dir}/01_protein/crd/protein.pdb"
pdb_file = retrieve_file_url(pdb_url)

# Fetch cofactors crystalwater pdb file                                                                                                                                                                                                        
# TODO: This part should be done using plbenchmarks API - once there is a conda pkg                                                                                                                                                            
cofactors_url = f"{base_repo_url}/raw/master/data/{target_dir}/01_protein/crd/cofactors_crystalwater.pdb"
cofactors_file = retrieve_file_url(cofactors_url)

# Concatenate protein with cofactors pdbs                                                                                                                                                                                                      
concatenate_files((pdb_file, cofactors_file), 'target.pdb')

# Fetch ligands sdf files and concatenate them in one                                                                                                                                                                                          
# TODO: This part should be done using plbenchmarks API - once there is a conda pkg                                                                                                                                                            
ligands_url = f"{base_repo_url}/raw/master/data/{target_dir}/00_data/ligands.yml"
with fetch_url_contents(ligands_url) as response:
    ligands_dict = yaml.safe_load(response.read())
ligand_files = []
for ligand in ligands_dict.keys():
    ligand_url = f"{base_repo_url}/raw/master/data/{target_dir}/02_ligands/{ligand}/crd/{ligand}.sdf"
    ligand_file = retrieve_file_url(ligand_url)
    ligand_files.append(ligand_file)
# concatenate sdfs                                                                                                                                                                                                                             
concatenate_files(ligand_files, 'ligands.sdf')

Presumably, we want to break this into multiple stages, or find some way to appropriately construct and execute the dependency graph on the cluster:

  • retrieve files needed for one or more benchmark system(s)
  • set up all transformations in parallel
  • run or resume all transformations in parallel
  • analyze data to generate plots

Alternatively, we can make sure that every transformation acts fully independently until the analysis stage at the end using diffnet.

We'll want to find a more clever way to refactor this in the next release so we can support benchmarking multiple targets at the same time as well.

jchodera avatar Jan 30 '22 03:01 jchodera

Yes, considering benchmarking is becoming bigger we do want to have better (more efficient and friendly) and smarter ways to deal with these situations. I wonder if we want to have a whole new benchmarking module for perses, considering how this is evolving.

ijpulidos avatar Jan 31 '22 16:01 ijpulidos

Seems like it is becoming even more beneficial to have a whole module for the benchmarks part of perses https://github.com/choderalab/perses/pull/1050#discussion_r903025368

ijpulidos avatar Jun 21 '22 20:06 ijpulidos

Is this for performance benchmarking (ns/day, time for a single calculation) or accuracy benchmarking?

jchodera avatar Jun 21 '22 22:06 jchodera

Is this for performance benchmarking (ns/day, time for a single calculation) or accuracy benchmarking?

This is for accuracy benchmarking. That is, running the systems in the protein-ligand-benchmark dataset and checking the plots and errors.

ijpulidos avatar Jun 21 '22 22:06 ijpulidos