signac-flow
signac-flow copied to clipboard
`auto` partition.
Feature description
I would find it more convenient to use flow if the partition were automatically selected based on the job resource request. Many clusters have separate CPU and GPU partitions, or separate shared and whole node partitions. In a workflow with mixed CPU/GPU jobs (and/or jobs of different sizes), the user must manually run (e.g.):
project.py submit -o .*gpu' --partition=gpu
project.py submit -o .*small' --partition=shared
project.py submit -o .*large' --partition=wholenode
Some operations may auto-scale depending on the number of jobs left to execute. Until the user runs the submission command, they don't know whether shared or wholenode is the appropriate partition.
Proposed solution
The user should be able to make one submission:
project.py submit --partition=auto
Additional context
auto would select from one of the "standard" partitions (e.g. not the debug or high memory partitions) based on the job request:
- If GPUs are requested, choose the gpu partition.
- If more than one node is requested, choose the wholenode partition.
- If less than one node is requested, choose the shared partition.
Any partition will remain settable explicitly on request.
This should be an easy feature. I would support its addition. We need the appropriate underscored attributes in the environment classes where we set it to None by default. Perhaps something like
_default_partitions = {"gpu-shared": "gpu",
"cpu-shared": "shared",
"cpu": "standard",
"gpu": "gpu"}
Yes, with that it may be possible to implement the auto selection in the base class this.
Some systems use separate accounts for CPU and GPU: #703. These would not be able to use the auto partition.
Some systems use separate accounts for CPU and GPU: #703. These would not be able to use the auto partition.
Could we make that a config option, where users can set a default account and a GPU account?
@tcmoore3 theoretically yes, but then I wonder if we are getting too niche with that. I would rather something more future proof or less logic on our side like an account argument to an operation decorator or perhaps as a decorator (like the second less as it is not really a resource). We could likewise specify a partition to make two more keyword arguments.