HPCPerfStats icon indicating copy to clipboard operation
HPCPerfStats copied to clipboard

How many GPUs used counter

Open stephenlienharrell opened this issue 2 years ago • 2 comments

We need a counter on the new version that says how many GPUs were used for a job.

stephenlienharrell avatar Aug 07 '23 15:08 stephenlienharrell

need to separate gpu counter data in order to implement this correctly

stephenlienharrell avatar Jan 23 '24 16:01 stephenlienharrell

preliminary implementation done and online for LS6. limitations: 1) raw data for individual GPUs are merged in the database when imported, so only the total percentage is availlable. 2) a few nodes in gpu-a100-small and gpu-dev seems don't have gpu recording enabled by the monitor, no gpu data is recorded, e.g. : https://ls6-stats.tacc.utexas.edu/machine/job/1473810/

Possible workaround without changing database stucture: make "event" to be "utilization_$gpunumber" instead of "utilization" when importing, then extract "$gpunumber" in views.py.

nicejunjie avatar Jan 23 '24 20:01 nicejunjie