variorum icon indicating copy to clipboard operation
variorum copied to clipboard

Add a user-level continuous monitoring mode to Variorum (potentially with `var_monitor`)

Open tpatki opened this issue 7 months ago • 0 comments

We want to be able to support use cases where var_monitor just runs in the background on each node (with sleep) and users can specify a var_monitor start and var_monitor stop with their jobs (most likely through a mechanism such as pdsh?). This may eventually deprecate the current var_monitor unless we're using the static and dynamic power capping features, but that remains to be seen.

We want this to be in the userspace and at the user job level, as anything with systemd will require admin support and epilog/prolog updates, and these are harder to deploy.

Still brainstorming this with @kshoga1, but capturing the use case. Ideas welcome.

This has come up in @kulnaman and @hariharan-devarajan's research, where we're running var_monitor sleep to capture node power when co-scheduling tasks from workflows.

tpatki avatar Jun 28 '24 20:06 tpatki