`pcie_bw` issue with AMDGPU support
nvtop calculates PCIe bandwidth usage based on KiB/s, but the correct value is B/s.
rocm_smi_lib uses number_of_received * max_packet_size (max_payload_size) / 1024.0 / 1024.0 or number_of_sent * max_packet_size (max_payload_size) / 1024.0 / 1024.0 to calculate PCIe bandwidth usage (MiB/s).
https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/python_smi_tools/rocm_smi.py#L1862-L1883
Also, pcie_bw needs at least 1s to read a file because it uses msleep(1000) to count on the AMDGPU driver side.
https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdgpu/vi.c#L1379
-d, --delay option of nvtop will not work if pcie_bw is supported.
I think we should have a separate thread for pcie_bw if possible.
Nvtop shows B/KiB/MiB depending on how much data is being transferred. The data is gathered from the pcie_bw interface and scaled accordingly https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/pm/amdgpu_pm.c#L1579
Umm, NVML returns the value in KiB/s, but the AMDGPU driver returns it in B/s (packet_count * max_payload_size[Byte]).
https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1gd86f1c74f81b5ddfaa6cb81b51030c72
Does nvtop convert KiB/s to B/s (for NVIDIA GPU) or B/s to KiB/s (for AMD GPU)?
P.S. nvtop currently is not detecting devices correctly in APU+dGPU environments and therefore cannot be tested, sorry.
Sorry I did not get what you meant the first time. Indeed the code was missing a division by 1024 to get in the kilobyte range, thanks. I pushed 04721e38f9b87bc640f68332d49e6473ede45e9f to fix it.
Could you please elaborate on what is wrong with APU+dGPU? Are one, the other or both GPUs not found or missing info?
Sorry I did not get what you meant the first time. Indeed the code was missing a division by 1024 to get in the kilobyte range, thanks. I pushed 04721e3 to fix it.
Thanks.
Could you please elaborate on what is wrong with APU+dGPU? Are one, the other or both GPUs not found or missing info?
nvtop detects both GPUs but uses the wrong index.
As a result, the processes on Device1 (RX 560) will be displayed as the processes on Device0 (APU).
https://github.com/Syllo/nvtop/issues/209
Fixed by https://github.com/Syllo/nvtop/commit/3e9ddef02d47a5aa0be1ab78d818284dd7c91cd1
nvtopdetects both GPUs but uses the wrong index. As a result, the processes on Device1 (RX 560) will be displayed as the processes on Device0 (APU).
But the pcie_bw problem still remains.
PCIe RX/TX will always be 0 because maxPayloadSize (256) is divided by 1024.
https://github.com/Syllo/nvtop/commit/04721e38f9b87bc640f68332d49e6473ede45e9f
- received *= maxPayloadSize;
- transmitted *= maxPayloadSize;
+ // Compute received/transmitter in KiB
+ received *= maxPayloadSize / 1024;
+ transmitted *= maxPayloadSize / 1024;
Also, the pcie_bw sysfs causes a 1s sleep on each read, during which the nvtop thread stops.
Probably, for multiple AMDGPUs that support pcie_bw, the nvtop threads will stop for that amount.
Oh my, I did not think hard enough about operator precedence in that case, thanks!
So is reading the file pcie_bw blocking when nvtop reads it faster than the driver refresh rate (1sec)?
I've been thinking about separating the data gathering and interface logic in two threads (and frankly should have done that from the start) but I have unfortunately little time to allocate to that right now.
I'm not sure about blocking, but pcie_bw sysfs reads are synchronous, so the thread waits, and both user input and interface updates stop for 1s.
This makes nvtop terribly difficult to use.
So is reading the file pcie_bw blocking when nvtop reads it faster than the driver refresh rate (1sec)?
I am not confident in safely using multithreading in C.
I think it would be reasonable to remove pcie_bw sysfs support or allow pcie_bw sysfs reading to be disabled from the configuration.