gpustat Is it possible to query just one GPU?

nvidia-smi has a flag/argument to query and monitor just one GPU using nvidia-smi -id=<id> I don't see any such flag or option for gpustat

It is usually helpful when let's say out of 4 or 5 GPUs one is having issue with device drivers and on querying nvidia-smi fails but if we query the individual GPUs, then the it produces regular results for the ones that are healthy.

It would be even great if gpustat could do it by default as in:

Calls nvida-smi
Checks if there's an error
Then sequentially checks all the available GPUs individually and produces a result

Jun 10 '22 08:06 skat00sh

Have you tried this with the latest development version? I guess when one device is failing, the latest version of gpustat would report the results for other drivers. If not we can ignore such errors and have a fallback mode (rather than querying individually and merging them).

Jun 14 '22 14:06 wookayin

I'm on version 0.6.0 and I get this output on gpustat Error on querying NVIDIA devices. Use --debug flag for details

on using nvidia-smi -L I get to know that one of the GPU drivers have some issue. Unable to determine the device handle for gpu 0000:08:00.0: Unknown Error

I didn't exactly understand what fallback mode means here?

Jun 15 '22 12:06 skat00sh

@wookayin Tried the latest version from the master branch as well. Still fails one out of the 5 GPUs I've on the server has the error as I described above. Any quick work-arounds?

Jun 27 '22 12:06 skat00sh

@skat00sh Can you please provide the full output of gpustat --debug (please install the dev version, or more conveniently 1.0.0.rc1)? I'd like to see which particular exception/error has been raised in your specific case.

Jul 06 '22 12:07 wookayin

Sure! Here's the output for the suggested version

(handcrafted-dp-opt) vyas@fe-computenode-2:/opt/sperl/students/devendra/projects/dp-adversarial (dev) 
$ gpustat --version
gpustat 1.0.0rc1
(handcrafted-dp-opt) vyas@fe-computenode-2:/opt/sperl/students/devendra/projects/dp-adversarial (dev) 
$ gpustat --debug
Error on querying NVIDIA devices. Use --debug flag for details
Traceback (most recent call last):
  File "/opt/sperl/students/devendra/miniconda3/envs/handcrafted-dp-opt/lib/python3.7/site-packages/gpustat/cli.py", line 20, in print_gpustat
    gpu_stats = GPUStatCollection.new_query(debug=debug)
  File "/opt/sperl/students/devendra/miniconda3/envs/handcrafted-dp-opt/lib/python3.7/site-packages/gpustat/core.py", line 537, in new_query
    handle = N.nvmlDeviceGetHandleByIndex(index)
  File "/opt/sperl/students/devendra/miniconda3/envs/handcrafted-dp-opt/lib/python3.7/site-packages/pynvml.py", line 1655, in nvmlDeviceGetHandleByIndex
    _nvmlCheckReturn(ret)
  File "/opt/sperl/students/devendra/miniconda3/envs/handcrafted-dp-opt/lib/python3.7/site-packages/pynvml.py", line 765, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.NVMLError_Unknown: Unknown Error

Jul 06 '22 20:07 skat00sh

@skat00sh Thanks for the information. It is strange that pynvml throws Unknown Error.

This is a special case of #81 so I rewrote #81 so that when one gpu is failing, it will display an error instead of throwing an error. Example:

[1] GeForce GTX TITAN 1 | 36°C,   0 % |  9000 / 12189 MB | user1(3000M) user3(6000M)
[2] ((Unknown Error))   |  ?°C,   ? % |     ? /     ? MB | (Not Supported)

Sep 14 '22 00:09 wookayin

The (Not supported) case was fixed by #81. We may want to add -id options nonetheless.

Oct 12 '22 03:10 wookayin

Added a new option --id.

e.g.

gpustat --id 0
gpustat --id 0,1,2

Mar 02 '23 14:03 wookayin

Thanks! It'd be really helpful!

Mar 06 '23 00:03 skat00sh

gpustat gpustat copied to clipboard

Is it possible to query just one GPU?

gpustat
gpustat copied to clipboard