watchme Monitor nvidia-smi output to see GPU resource consumption

Monitor nvidia-smi output to see GPU resource consumption

Open samhodge-aiml opened this issue 11 months ago • 2 comments

Is your feature request related to a problem? Please describe. I need to see how much VRAM and GPU compute are being used by a process in a container, and have a historical record in a sql table to continue to narrow the gap between resources allocated and resources consumed

Describe the solution you'd like I would like to be able to wrap the output of nvidia-smi and have it come out in the same dictionary or a side car type concept for the rest of the watchme metrics

Describe alternatives you've considered Use the following https://github.com/petronny/nvsmi and dump that into a dictionary at the same time as the watchme decorator

Additional context Getting computation to match the resources allocated closely is a problem with commercial value, anyone who makes use of GPUs should be interested in how much these resources are occupied because buying and renting them is not cheap

Mar 13 '24 06:03 samhodge-aiml

Sorry I found the correct documentation

https://github.com/vsoch/watchme/blob/f209d3d4bf99a25cd2dcaeaa2431cf3ecfe68585/docs/_docs/watcher-tasks/gpu.md#use-as-a-decorator

Mar 14 '24 06:03 samhodge-aiml

hey @samhodge-aiml ! This seems like a cool idea (and simple to implement) but I'm not sure I'll have time to work on it soon - too many cool things going on <3

Mar 15 '24 05:03 vsoch

watchme watchme copied to clipboard

Monitor nvidia-smi output to see GPU resource consumption

watchme
watchme copied to clipboard