HPCPerfStats icon indicating copy to clipboard operation
HPCPerfStats copied to clipboard

Support multiple jobs on the same node

Open stephenlienharrell opened this issue 2 years ago • 2 comments

Currently we collect everything at a node-level. We need to examine what metrics can be split out (on a core or socket basis) and what is not able to be split out and if that is useful.

stephenlienharrell avatar Jun 13 '23 15:06 stephenlienharrell

for CPU need core-affinity matched to job id

For Memory: Need to find all memory usage from primary job starter programmatically. Find job starter, then get all child process memory: ps -o pid,ppid,pgid,comm,%cpu,%me

Snapshot this at the same time as the rest of the metrics - find out if there is a way to get the job id, then match jobid to specific processes on-node to get snapshot of memory usage.

Can we do this programmatically for any other statistics?

stephenlienharrell avatar Jun 20 '23 15:06 stephenlienharrell

regarding the approach above, need to make sure we can capture detached processes

stephenlienharrell avatar Jun 21 '23 18:06 stephenlienharrell

Duplicate of #46

sanga1999 avatar Aug 20 '24 15:08 sanga1999