toil icon indicating copy to clipboard operation
toil copied to clipboard

Add `seff` to Toil Slurm?

Open adamnovak opened this issue 2 years ago • 5 comments
trafficstars

The seff command can get statistics about how much CPU and memory a job actually used.

It might be useful for Toil to expose this information in its logs somehow when running on Slurm, either to diagnose OOM or to diagnose excessively large job resource allocations.

┆Issue is synchronized with this Jira Story ┆Epic: Improve debugging experience ┆Issue Number: TOIL-1349

adamnovak avatar Jun 15 '23 17:06 adamnovak

Toil could maybe have an alert system when resource usage is too high maybe? Making sure that the spurious to useful alerts wouldn't be too high, though this may be difficult? We should discuss this later, but it might be low priority for now.

DailyDreaming avatar Jun 27 '23 17:06 DailyDreaming

Yes, this would be good to have; and report as part of RO-Crate WorkflowRun provenance

mr-c avatar Jun 29 '23 00:06 mr-c

This might also be useful for Dockstore as part of a workflow analytics report-back feature/a bigger system.

adamnovak avatar Mar 21 '24 17:03 adamnovak

➤ Adam Novak commented:

And it would be good for Toil to warn you if your jobs are over-provisioned and wasting cluster CPU and memory, rather than you having to manually guess which Slurm job is which and ask Slurm about them all.

unito-bot avatar Mar 21 '24 17:03 unito-bot

➤ Adam Novak commented:

We should put this into toil stats or figure out if it is redundant with stuff toil stats already does.

unito-bot avatar Mar 21 '24 17:03 unito-bot