maestrowf
maestrowf copied to clipboard
Add execution order controls
Add machinery for controlling execution order/priority of study steps
- Adds weights to the graph to enable selecting between depth first and breadth first (current production mode) execution order.
- Adds new execution block for exposing controls of orders and future hooks for using various step metadata for controlling step weights
- Changes internal machinery to use a PriorityQueue to store the ready steps
@jwhite242 -- Is this a dead end at this point? We're assessing Maestro and it's looking like DFS would be good on our end too. Wondering just in case we need to revisit.
@jwhite242 -- Is this a dead end at this point? We're assessing Maestro and it's looking like DFS would be good on our end too. Wondering just in case we need to revisit.
No, i just got derailed by other things for a bit. Am ramping back up on this now. Having the more general expression based priorities are going to be pretty helpful -> major use case here being getting big/long running variants of steps running sooner, allowing smaller ones to be churned through within the throttle limit alongside it for improved throughput.
Thinking more on the protocol question.. I'm on the fence on whether we shouldn't just use abstract base classes and tie info to these things; i.e. per step overrides of expressions. But will play with both and see how they feel
@FrankD412, @bgunnar5, @jsemler
Think this is finally ready for another pass/real review. An interesting question left (beyond any implementation issues/comments) is what to do about the spec. I refactored it to be a list so it's more clearly ordered for users, but maybe it'd make sense to contian this list in a subkey (priority_expressions
or something) instead of at the root of the execution block? Don't have any other things in mind for this block yet, but thinking the key would be more future proof in case we do think of something. (i know docs are slightly out of sync, pending this subkey/not question)
actually, just had another thought that might fit nicer, expanding it and making the value more of a 'oneOf' type, so either value or expression, making it more clear that there's two types and avoiding having to do greedy parsing on things to figure it out on our end
execution:
priority:
- name:
description: # optional, but encouraged... can make built-ins dump the code's internal description in the reserialized spec
value: # use this for built-ins with string keys to select (e.g. current 'step-order')
- name:
description:
expression: # use this for the eventual string based expression compilation
actually, just had another thought that might fit nicer, expanding it and making the value more of a 'oneOf' type, so either value or expression, making it more clear that there's two types and avoiding having to do greedy parsing on things to figure it out on our end
execution: priority: - name: description: # optional, but encouraged... can make built-ins dump the code's internal description in the reserialized spec value: # use this for built-ins with string keys to select (e.g. current 'step-order') - name: description: expression: # use this for the eventual string based expression compilation
Continued tweaking/iteration with a mind toward this being amenable to a mix of built-in/user things (think plugins for reusable expressions checkable via the dependencies machinery)
execution:
priority:
- prioritizer_id: step_order # built-in dag traversal order method
args:
- step_order: 'depth-first'
- ...
- prioritizer_id: expression # the built in expression prioritizer
expression: step.procs*step.walltime ....
Think of the prioritizer_id (or similar name) as akin to the key used to id script adapters, so we can use this to tag plugin installed things in a way that makes error messaging helpful since it's a standardized place to register these functions. So for the sharing, maestro can tell the recipient that they're missing this plugin somebody was using. Still like the idea of descriptions too, though not sure about making them mandatory given the built-in ones can have that set internally and just serialized to the spec in the workspace