[BUG] UTF-8 Error during decoding device name on R555 driver
Required prerequisites
- [X] I have read the documentation https://nvitop.readthedocs.io.
- [X] I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
- [X] I have tried the latest version of nvitop in a new isolated virtual environment.
What version of nvitop are you using?
1.3.2
Operating system and version
Ubuntu 22.04 / WSL
NVIDIA driver version
555.42.03
NVIDIA-SMI
Sun May 26 15:50:45 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 On | N/A |
| 36% 42C P8 31W / 370W | 1544MiB / 24576MiB | 6% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 24 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
Python environment
3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0] linux nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.1.105
Problem description
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte
Steps to Reproduce
nvitop
Traceback
Traceback (most recent call last):
File "/home/kang/miniconda3/envs/pytorch/bin/nvitop", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/cli.py", line 353, in main
ui = UI(
^^^
File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/ui.py", line 43, in __init__
self.main_screen = MainScreen(
^^^^^^^^^^^
File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/screens/main/__init__.py", line 38, in __init__
self.device_panel = DevicePanel(self.devices, compact, win=win, root=root)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/screens/main/device.py", line 61, in __init__
self.snapshots = self.take_snapshots()
^^^^^^^^^^^^^^^^^^^^^
File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/cachetools/__init__.py", line 702, in wrapper
v = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/screens/main/device.py", line 142, in take_snapshots
snapshots = [device.as_snapshot() for device in self.all_devices]
^^^^^^^^^^^^^^^^^^^^
File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/library/device.py", line 72, in as_snapshot
self._snapshot = super().as_snapshot()
^^^^^^^^^^^^^^^^^^^^^
File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/api/device.py", line 2146, in as_snapshot
**{key: getattr(self, key)() for key in self.SNAPSHOT_KEYS},
^^^^^^^^^^^^^^^^^^^^
File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/api/device.py", line 868, in name
self._name = libnvml.nvmlQuery('nvmlDeviceGetName', self.handle)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/api/libnvml.py", line 433, in nvmlQuery
retval = func(*args, **kwargs) # type: ignore[operator]
^^^^^^^^^^^^^^^^^^^^^
File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/pynvml.py", line 1921, in wrapper
return res.decode()
^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte
Logs
No response
Expected behavior
No response
Additional context
No response
Similar issues on other repos:
- wookayin/gpustat#170
- gpuopenanalytics/pynvml#53
I cannot reproduce this on native Linux with 555.42.02 driver (the latest driver shipped with CUDA toolkit 12.5 at the time this comment is posted).
$ nvidia-smi
Sun May 26 22:40:20 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 Off | N/A |
| 53% 45C P8 14W / 170W | 2MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
It seems this is a bug that only occurs in WSL with 555.85 driver.
Same problem
$ nvidia-smi
Fri Jun 14 20:48:14 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01 Driver Version: 555.99 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 ... On | 00000000:01:00.0 On | N/A |
| 71% 72C P0 279W / 285W | 6457MiB / 16376MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 32210 C /python3.8 N/A |
+-----------------------------------------------------------------------------------------+
Same issue for me as well. Windows 11, WSL2. Used to use nvitop just fine then stopped working with the UnicodeDecodeError error. Found this issue and I remembered I had also upgraded Nvidia Driver.
Same problem, Windows 11, WSL2 | NVIDIA-SMI 555.58.02 Driver Version: 556.12 CUDA Version: 12.5
根据报错我把这个文件改了(~/anaconda3/lib/python3.12/site-packages/pynvml.py", line 1921),,然后就有这个效果
Could you try to use the latest version of nvitop and downgrade the nvidia-ml-py version?
pip3 install git+https://github.com/XuehaiPan/nvitop.git#egg=nvitop
pip3 install nvidia-ml-py==11.515.48
install nvidia-ml-py==11.515.48
这边试过:
nvidia-ml-py 11.515.48
nvitop 1.3.3.dev20+g6bc8a8b
同样的问题,只是位置变了
I was able to confirm that the problem is fixed with NVIDIA driver version 560.70.
https://github.com/wookayin/gpustat/issues/170#issuecomment-2241108111
The latest nvidia driver has fixed this issue. Simply download the latest Windows driver from https://www.nvidia.cn/drivers/lookup/
FYI, a new release has been made.