maestrowf icon indicating copy to clipboard operation
maestrowf copied to clipboard

Updates to use pyaestro to back a local pool style adapter.

Open FrankD412 opened this issue 4 years ago • 4 comments

This PR is a preliminary step towards a more "ensemble"-ish adapter related to the functionality discussed in #330.

This new LocalPoolAdapter utilizes pyaestro's Executor class in order to fake a scheduler. Some potential future additions are timeouts and resource support for the above #330.

FrankD412 avatar Feb 23 '21 03:02 FrankD412

@jwhite242 -- Some considerations based on why we were discussing earlier:

  • I was hoping to eliminate the checks for local execution in the ExecutionGraph._StepRecord. This would bring all execution in line with scheduler-based adapters. You had mentioned some benefits to keeping the distinction around. What are your thoughts?
  • We need to conceptualize how we would handle job packing in an Executor for pyaestro and then do something similar here. What have you found in terms of packing and the like?

FrankD412 avatar Feb 23 '21 03:02 FrankD412

Current todo: Cancellation works at the Maestro level, but leaves the pool executing the last processes that were running before the cancellation was posted. This bug boils up to a pyaestro bug.

FrankD412 avatar Feb 23 '21 04:02 FrankD412

@jwhite242 -- Some considerations based on why we were discussing earlier:

* I was hoping to eliminate the checks for local execution in the `ExecutionGraph._StepRecord`. This would bring all execution in line with scheduler-based adapters. You had mentioned some benefits to keeping the distinction around. What are your thoughts?

* We need to conceptualize how we would handle job packing in an Executor for pyaestro and then do something similar here. What have you found in terms of packing and the like?

So I was thinking this could be a useful distinction based on the two possible run modes: standalone batch jobs (the current scheduler adapters) and running maestro inside of an allocation manually for job packing (or spawned by another tool built on top of maestro). Doesn't necessarily need to use local as the distinction, but do we need some hook to enable the latter job packing mode?

As for the job packing, this is something where i think plugins would be really useful, or some way for users to write these things like with pgen. The number of scheduling behaviors and the optimization algorithms for implementing them is a pretty large space that doesn't necessarily need to be hard wired into maestro. A few simple/interesting ones would be good of course.

jwhite242 avatar Feb 23 '21 16:02 jwhite242

@jwhite242 -- Some considerations based on why we were discussing earlier:

* I was hoping to eliminate the checks for local execution in the `ExecutionGraph._StepRecord`. This would bring all execution in line with scheduler-based adapters. You had mentioned some benefits to keeping the distinction around. What are your thoughts?

* We need to conceptualize how we would handle job packing in an Executor for pyaestro and then do something similar here. What have you found in terms of packing and the like?

So I was thinking this could be a useful distinction based on the two possible run modes: standalone batch jobs (the current scheduler adapters) and running maestro inside of an allocation manually for job packing (or spawned by another tool built on top of maestro). Doesn't necessarily need to use local as the distinction, but do we need some hook to enable the latter job packing mode?

As for the job packing, this is something where i think plugins would be really useful, or some way for users to write these things like with pgen. The number of scheduling behaviors and the optimization algorithms for implementing them is a pretty large space that doesn't necessarily need to be hard wired into maestro. A few simple/interesting ones would be good of course.

I was thinking about this and the case of running Maestro in an allocation would be where you have an "allocation adapter" class that comes in handy. That class would take in the global set of resources requested and schedule the conductor call. That's actually where I wanted to split out the MPI related functionality from the SchedulerScriptAdapter since you could either monkey patch the method in the LocalPoolAdapter to generate the right MPI or just have it in the batch settings to use a particular MPI (usual factory pattern there to get the class). We could mock that up in a discussion thread.

For this PR, I was thinking if I can get the local pool as the main local adapter (it would still return that it's local, but in the ExecutionGraph it'd be treated just like everything else since it's a process pool. The records would still return local otherwise.

I'm starting to wonder if this is a good time to introduce an orchestrate sub-command to handle the allocation clear so it's more transparent to the user? -- we should make a discussion on this haha

Hopefully this is making sense.

FrankD412 avatar Feb 25 '21 17:02 FrankD412