gpustat icon indicating copy to clipboard operation
gpustat copied to clipboard

Some low-level errors (like `pynvml.nvml.NVMLError_LibRmVersionMismatch`) result in nothing printed (std or diagnostic)

Open munael opened this issue 3 years ago • 1 comments

Describe the bug

Something caused a version mismatch somewhere and I can no longer use gpustat. Nothing at all is printed on stdout or stderr. Running with --debug prints nothing as well. I launched it as python -m pdb -m gpustat and stepped through until noticing an error raised in:

/opt/conda/lib/python3.8/site-packages/pynvml/nvml.py(718)

of type pynvml.nvml.NVMLError_LibRmVersionMismatch.

Screenshots or Program Output

Please provide the output of gpustat --debug and nvidia-smi. Or attach screenshots if applicable.

Environment information:

  • OS: Ubuntu 20.04
  • NVIDIA Driver version: 510.73.08
  • The name(s) of GPU card: Tesla V100-SXM2
  • gpustat version: 1.0.0
  • pynvml version: 11.495.46

Additional context

Add any other context about the problem here.

munael avatar Dec 20 '22 01:12 munael

Can you please provide a full stacktrace from gpustat --debug (or with pdb)? On your side nothing is printed, right? I'd like to know which nvml... call throws the error.

In pdb you can do (Pdb) bt to obtain the full stacktrace in a post-mortem mode.

wookayin avatar Dec 24 '22 18:12 wookayin