nvitop icon indicating copy to clipboard operation
nvitop copied to clipboard

[BUG] UTF-8 Error during decoding device name on R555 driver

Open kangkannnng opened this issue 1 year ago • 9 comments

Required prerequisites

  • [X] I have read the documentation https://nvitop.readthedocs.io.
  • [X] I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • [X] I have tried the latest version of nvitop in a new isolated virtual environment.

What version of nvitop are you using?

1.3.2

Operating system and version

Ubuntu 22.04 / WSL

NVIDIA driver version

555.42.03

NVIDIA-SMI

Sun May 26 15:50:45 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0  On |                  N/A |
| 36%   42C    P8             31W /  370W |    1544MiB /  24576MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        24      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

Python environment

3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0] linux nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.1.105

Problem description

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte

Steps to Reproduce

nvitop

Traceback

Traceback (most recent call last):
  File "/home/kang/miniconda3/envs/pytorch/bin/nvitop", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/cli.py", line 353, in main
    ui = UI(
         ^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/ui.py", line 43, in __init__
    self.main_screen = MainScreen(
                       ^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/screens/main/__init__.py", line 38, in __init__
    self.device_panel = DevicePanel(self.devices, compact, win=win, root=root)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/screens/main/device.py", line 61, in __init__
    self.snapshots = self.take_snapshots()
                     ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/cachetools/__init__.py", line 702, in wrapper
    v = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/screens/main/device.py", line 142, in take_snapshots
    snapshots = [device.as_snapshot() for device in self.all_devices]
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/library/device.py", line 72, in as_snapshot
    self._snapshot = super().as_snapshot()
                     ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/api/device.py", line 2146, in as_snapshot
    **{key: getattr(self, key)() for key in self.SNAPSHOT_KEYS},
            ^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/api/device.py", line 868, in name
    self._name = libnvml.nvmlQuery('nvmlDeviceGetName', self.handle)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/api/libnvml.py", line 433, in nvmlQuery
    retval = func(*args, **kwargs)  # type: ignore[operator]
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/pynvml.py", line 1921, in wrapper
    return res.decode()
           ^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte

Logs

No response

Expected behavior

No response

Additional context

No response

kangkannnng avatar May 26 '24 07:05 kangkannnng

Similar issues on other repos:

  • wookayin/gpustat#170
  • gpuopenanalytics/pynvml#53

I cannot reproduce this on native Linux with 555.42.02 driver (the latest driver shipped with CUDA toolkit 12.5 at the time this comment is posted).

$ nvidia-smi
Sun May 26 22:40:20 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        On  |   00000000:01:00.0 Off |                  N/A |
| 53%   45C    P8             14W /  170W |       2MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

It seems this is a bug that only occurs in WSL with 555.85 driver.

XuehaiPan avatar May 26 '24 14:05 XuehaiPan

Same problem

$ nvidia-smi
Fri Jun 14 20:48:14 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01              Driver Version: 555.99         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
| 71%   72C    P0            279W /  285W |    6457MiB /  16376MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     32210      C   /python3.8                                  N/A      |
+-----------------------------------------------------------------------------------------+

ssjjrrr avatar Jun 14 '24 12:06 ssjjrrr

Same issue for me as well. Windows 11, WSL2. Used to use nvitop just fine then stopped working with the UnicodeDecodeError error. Found this issue and I remembered I had also upgraded Nvidia Driver.

Saya47 avatar Jul 03 '24 12:07 Saya47

Same problem, Windows 11, WSL2 | NVIDIA-SMI 555.58.02 Driver Version: 556.12 CUDA Version: 12.5

winkeylucky avatar Jul 04 '24 07:07 winkeylucky

image 根据报错我把这个文件改了(~/anaconda3/lib/python3.12/site-packages/pynvml.py", line 1921),,然后就有这个效果 image

winkeylucky avatar Jul 04 '24 08:07 winkeylucky

Could you try to use the latest version of nvitop and downgrade the nvidia-ml-py version?

pip3 install git+https://github.com/XuehaiPan/nvitop.git#egg=nvitop
pip3 install nvidia-ml-py==11.515.48

XuehaiPan avatar Jul 04 '24 09:07 XuehaiPan

install nvidia-ml-py==11.515.48

这边试过: nvidia-ml-py 11.515.48 nvitop 1.3.3.dev20+g6bc8a8b image 同样的问题,只是位置变了

winkeylucky avatar Jul 04 '24 16:07 winkeylucky

I was able to confirm that the problem is fixed with NVIDIA driver version 560.70.

https://github.com/wookayin/gpustat/issues/170#issuecomment-2241108111

Gh0stExp10it avatar Jul 20 '24 12:07 Gh0stExp10it

The latest nvidia driver has fixed this issue. Simply download the latest Windows driver from https://www.nvidia.cn/drivers/lookup/

kenvix avatar Aug 11 '24 08:08 kenvix

FYI, a new release has been made.

XuehaiPan avatar Dec 29 '24 14:12 XuehaiPan