alchemiscale User story: Send a single simulation to F@H

In broad terms, what are you trying to do?

I would like to send a fully prepared (i.e. don't do anything server side except execute if possible?) OpenMM simulation (as xml?) to F@H to get performed, and the results of a reporter to be (eventually) returned. More specifically, a single free energy simulations we've set up using our prototype tooling, and the timeseries of dH values (from reporter)

How do you believe using this project would help you to do this?

I think this was mentioned as possible, but I wouldn't know where to start submitting things.

What problems do you anticipate with using this project to achieve the above?

If we accidentally send a bad simulation it might cause grief. Getting the data back.

Feb 16 '22 14:02 richardjgowers

I was thinking about this. Would there also be a python script to run it, specifying the timestep/thermostat/output reporters/etc? On one hand, this could expose us to arbitrary code execution, and that's bad. But on the other it'd be extremely hard to proscribe a configuration schema ahead of time that could handle all the different sorts of configurations that we think of for a "single simulation". So I'd be in favor of the "single simulation" mode requiring a python script as input.

Feb 16 '22 15:02 j-wags

I was just imagining a restful endpoint where I can fire xml (which includes? thermostats / reporters etc)

Feb 16 '22 15:02 richardjgowers

More specifically, a single free energy simulations we've set up using our prototype tooling, and the timeseries of dH values (from reporter

@richardjgowers : Can you elaborate a bit more on this? I'm guessing at a few things, but from what I can tell, you want to

Provide, as input, serialized System, State, and Integrator files for a single OpenMM simulation, along with some information about how long to run
Somehow also provide additional information about what information you want returned (such as how often to compute energies, which global parameters to change to compute the energies, and at which values of those parameters you want energies to be returned)
You want to do this for totally arbitrary systems, one at a time, which may be of any size? That is, you wouldn't want to submit a bunch of calculations for related systems---you might want to run one calculation for a 200 residue protein and then another one for a 450 residue protein and then something else for a 100 residue protein?

Can you give us some idea of the actual use case here? Why would you be doing this with one simulation at a time and want to run on the largest distributed computing platform in the world? Presumably you have some concept of "scale" in mind here that you didn't articulate in your story?

Feb 19 '22 18:02 jchodera

Would there also be a python script to run it, specifying the timestep/thermostat/output reporters/etc?

Just to copy down an important point that @jchodera shared during the meeting on Tuesday - Arbitrary python code execution on volunteer hosts is probably out of the question. So this will need to go through pre-made workflows, and have specific data fields that it can fill in.

Feb 25 '22 02:02 j-wags

Raw notes from story review, shared here for visibility:

wants to be able to submit single OpenMM simulations that are fully prepared (ready to execute MD), get results back eventually, but hopefully quickly (hours)
that turnaround time is key here; OpenFE may be able to amortize cost of time by firing off many systems at once for probing different problems, but they need reasonably-fast turnaround to be able to iterate quickly
perhaps this still fits well into the idea of submitting whole graphs, but this is a graph with two nodes, one edge?

Mar 01 '22 15:03 dotsdl