variorum
variorum copied to clipboard
Add a user-level continuous monitoring mode to Variorum (potentially with `var_monitor`)
We want to be able to support use cases where var_monitor
just runs in the background on each node (with sleep) and users can specify a var_monitor start
and var_monitor stop
with their jobs (most likely through a mechanism such as pdsh
?). This may eventually deprecate the current var_monitor
unless we're using the static and dynamic power capping features, but that remains to be seen.
We want this to be in the userspace and at the user job level, as anything with systemd
will require admin support and epilog/prolog updates, and these are harder to deploy.
Still brainstorming this with @kshoga1, but capturing the use case. Ideas welcome.
This has come up in @kulnaman and @hariharan-devarajan's research, where we're running var_monitor
sleep to capture node power when co-scheduling tasks from workflows.