DockStream icon indicating copy to clipboard operation
DockStream copied to clipboard

Parallelization of ADV for docking

Open LarsAC opened this issue 2 years ago • 3 comments

Hello,

I am trying to run first docking experiments together with reinvent. I am observing many ADV jobs getting started with -cpu 1 (hardcoded), but a few (1 or 2) take quite long and leave all other CPUs idle until the batch has finished and a new batch has started.

This leaves quite some capacity of a e.g. 16-core machine unused - at least that is my impression when observing the run via top or ps. In the dockstream.config, parallelization.number_cores is set to 16.

Are there better practical settings to better exploit larger machines with 16-64 CPUs ?

Lars

LarsAC avatar May 03 '22 09:05 LarsAC

Hi @LarsAC,

In regards to exploiting larger machines, the parallelization.number_cores command should be partitioning docking jobs to run in parallel. You could try setting parallelization.max_compounds_per_subjob to a certain number to enforce how many compounds for docking are allocated to each core at a given time. This may help with the idle time. In regards to the CPUs being idle until the batch has finished, do you mean the batch of SMILES generated at a given epoch or the next batch of SMILES (at the next epoch) generated by REINVENT? If it is the latter case, the docking must finish for the current batch of SMILES before the next batch is generated so that the REINVENT agent can be updated for the next epoch.

GuoJeff avatar May 06 '22 00:05 GuoJeff

Thanks for the explanation @GuoJeff ! In the meantime I have set parallelization.number_cores to 4 and patched the code to call ADV with -cpu 4. Together with a batch size of 64 rather than 128, a 16 core VM on Azure runs 5-6 batches/hr rather than 1. However, the monitoring on AzureML still reports CPU usage of about 60% maximum - my gut feeling is this could still be improved. I'll try to work with max_compounds_per_subjob in addition to see if I can further speed up things.

The idle capacity observed is indeed within a batch - so a number of ADV docking jobs (parallelization.number_cores ?) are started, but many of them finish quite quickly whereas one or two keep running for longer time. Only once these are finished, a new set of parallelization.number_cores ADV jobs seems to get started, leaving all other cores idle until the last ADV job has finished.

Lars

LarsAC avatar May 06 '22 19:05 LarsAC

Thanks both for mentioning this feature! I've just tested it with GOLD and it worked immediately. Maybe AutoDockVina is more complicated.

j3mdamas avatar May 17 '22 14:05 j3mdamas