openff-evaluator
openff-evaluator copied to clipboard
GPU usage with schema and merge orders
This is less of an issue and more of a question. In running an estimation with two schemas, SolvationFreeEnergy
and HostGuestBindingAffinity
here, I'm aware that HostGuestBindingAffinity
is able to make full use of all available dask workers, while SolvationFreeEnergy
doesn't make full use of a single worker. But what I was surprised to see was that when I ran my code with the host_guest_data_set
merged into the freesolv_data_set
and with the solvation_schema
added to estimation_options
before the host_guest_schema
, the binding simulation was restricted to one GPU for the entire calculation.
I tried to fix this problem by switching two things: the line order of the schema additions and which data set is merged into the other. In swapping both of my original orders, the problem has completely gone away. The solvation and binding run simultaneously until the solvation is complete, at which point the binding calculation is able to utilize all four of my available workers.
So what I'm wondering is, is this part of normal operation? Do I need to make sure I load certain schemas first or merge certain datasets into their counterparts and not vice versa? In the future I plan on testing the job with just one of my two fixes implemented to see which one was actually responsible for correcting the GPU usage. Below is my code with relevant lines marked with asterisks.
freesolv_data_set = PhysicalPropertyDataSet.from_pandas(molecule)
host_guest_data_set = TaproomDataSet(
#####
)
*** freesolv_data_set.merge(host_guest_data_set)
#FIXED VERSION:
#host_guest_data_set.merge(freesolv_data_set)
solvation_schema = SolvationFreeEnergy.default_simulation_schema(use_implicit_solvent=True)
APR_settings = APRSimulationSteps(
#####
)
host_guest_schema = HostGuestBindingAffinity.default_paprika_schema(
simulation_settings=APR_settings,
use_implicit_solvent=True,
enable_hmr=False,
)
estimation_options = RequestOptions()
estimation_options.calculation_layers = ["SimulationLayer"]
*** estimation_options.add_schema(
"SimulationLayer", "SolvationFreeEnergy", solvation_schema
)
*** estimation_options.add_schema(
"SimulationLayer", "HostGuestBindingAffinity", host_guest_schema
)
#FIXED VERSION:
#Swapped order of the two starred .add_schema methods to have host_guest_schema go first
print("All schemas were added to estimation_options")
# Create Pool of Dask Workers
calculation_backend = DaskLocalCluster(
number_of_workers=4,
resources_per_worker=ComputeResources(
number_of_threads=1,
number_of_gpus=1,
preferred_gpu_toolkit=ComputeResources.GPUToolkit.CUDA,
),
)
calculation_backend.start()