nvtop icon indicating copy to clipboard operation
nvtop copied to clipboard

Lag when scrolling through applications scrolling through ncurses UI

Open IcyAlmond opened this issue 3 years ago • 16 comments
trafficstars

E.g. When scrolling through the list of apps utilising the GPU with arrow keys, after 3 or so entries, it stutters, it stops scrolling even if you press down/up arrow keys and it blinks to what it's supposed to after around a second.

It happens in setup menu too. Edit: it happens with mouse scrolling too

Also, is there a way to separate/distinguish which application is running on which GPU?

IcyAlmond avatar Apr 09 '22 01:04 IcyAlmond

Hello, How many GPUs do you have on your system? Are they AMD ones?

Syllo avatar Apr 10 '22 09:04 Syllo

Yes, 1 amd and 1 nvidia

IcyAlmond avatar Apr 10 '22 09:04 IcyAlmond

Could you please try something: Compile with

  • cmake .. -DNVIDIA_SUPPORT=ON -DAMDGPU_SUPPORT=OFF
  • cmake .. -DNVIDIA_SUPPORT=OFF -DAMDGPU_SUPPORT=ON

In both cases run nvtop and see if you can reproduce the slowdown with only one vendor active

Syllo avatar Apr 10 '22 10:04 Syllo

It doesn't happen when just nvidia support is compiled. it happens when with amdgpu is compiled

IcyAlmond avatar Apr 10 '22 12:04 IcyAlmond

Possibly scanning all the fds in /proc caused the lag? htop does this too so I wasn't concerned.

Could you do a $ time strace -c path/to/nvtop, wait a few few seconds, and exit nvtop. it should show something like:

$ time strace -c nvtop/build/src/nvtop 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 64.39    0.380565           6     58393       259 newfstatat
 14.45    0.085425           6     13474      4269 openat
 10.44    0.061728          19      3205           getdents64
  6.28    0.037105           4      9201           close
  2.07    0.012231           3      3328         2 fcntl
  1.16    0.006847          12       566           read
  0.53    0.003108          37        82         1 ioctl
  0.29    0.001725           4       400           kcmp
  0.15    0.000871           4       214           write
  0.07    0.000396           8        49           poll
  0.05    0.000268           3        76        60 readlink
  0.04    0.000246         246         1           execve
  0.03    0.000201           4        47           mmap
  0.02    0.000097           3        31           rt_sigaction
  0.01    0.000082           4        18           lseek
  0.01    0.000049           4        12           mprotect
  0.01    0.000036           4         8           munmap
  0.00    0.000026           2        11           pread64
  0.00    0.000017           8         2         1 access
  0.00    0.000015           0        19           brk
  0.00    0.000003           3         1           getrandom
  0.00    0.000002           2         1           arch_prctl
  0.00    0.000002           2         1           set_tid_address
  0.00    0.000002           2         1           set_robust_list
  0.00    0.000002           2         1           prlimit64
  0.00    0.000002           2         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    0.591051           6     89143      4592 total

real	0m10.809s
user	0m0.353s
sys	0m1.749s

zhuyifei1999 avatar Apr 10 '22 16:04 zhuyifei1999

Also, approximately how many processes are running and how many fds are open? i.e. what's the output of $ ls -d /proc/{1..9}*/fd/* | wc -l and $ ls /proc/{1..9}*/fd/ | wc -l (if you nvtop as root, the second command should also be run as root)?

zhuyifei1999 avatar Apr 10 '22 16:04 zhuyifei1999

$ time strace -c /usr/bin/nvtop
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 60.03    0.090864           3     24608         1 newfstatat
 14.18    0.021468           4      5342      1621 openat
 10.68    0.016164          13      1205           getdents64
  7.12    0.010783           2      3716           close
  3.04    0.004606           6       659           read
  2.35    0.003555           2      1342           fcntl
  0.86    0.001304           2       490           kcmp
  0.65    0.000989           3       276           write
  0.42    0.000640           3       164       132 readlink
  0.31    0.000464           7        61         1 ioctl
  0.10    0.000144           1        78           poll
  0.07    0.000111           2        40           lseek
  0.05    0.000074           2        28           mmap
  0.05    0.000072           1        47           rt_sigaction
  0.03    0.000047           2        22           brk
  0.02    0.000027           4         6           munmap
  0.02    0.000025           2        10           mprotect
  0.01    0.000014           7         2         2 connect
  0.01    0.000011           5         2           socket
  0.01    0.000008           4         2         1 access
  0.00    0.000000           0         4           pread64
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2         1 arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           prlimit64
  0.00    0.000000           0         1           getrandom
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    0.151370           3     38112      1759 total

real	0m9.998s
user	0m0.047s
sys	0m0.519s
$ ls -d /proc/{1..9}*/fd/* | wc -l
[...redacted cannot open directory. permission denied stuff from ls]
4416
$ ls /proc/{1..9}*/fd/ | wc -l
[...redacted cannot open directory. permission denied stuff from ls]
4658

Video just so we are sure we are talking about the same issue:

https://user-images.githubusercontent.com/67372293/162766862-25c48c08-d099-408f-8c16-662ed06c709b.mp4

Where the highlight jumps is where the lag/stutter happens, I continue pressing up/down arrow and it catches up after it updates the screen

IcyAlmond avatar Apr 11 '22 14:04 IcyAlmond

real	0m9.998s
user	0m0.047s
sys	0m0.519s

To make sure, during this run, was lag happening? Because (0.519 + 0.047) / 9.998 = 5.7% busy and that isn't high enough to cause major lag just from being busy I think

zhuyifei1999 avatar Apr 11 '22 15:04 zhuyifei1999

Yes, it had the same stutter as in the video

IcyAlmond avatar Apr 11 '22 16:04 IcyAlmond

When it lags, is the entire screen laggy, or just nvtop? I'm wondering if it's nvtop itself being laggy, or nvtop doing something to the gpu causing the gpu to become laggy.

zhuyifei1999 avatar Apr 11 '22 17:04 zhuyifei1999

Just nvtop

IcyAlmond avatar Apr 12 '22 00:04 IcyAlmond

I have no idea what's wrong then. I have 5443 fds opened by my user, 8248 fds total (sudo ls /proc/{1..9}*/fd/ | wc -l), and I'm experiencing no lag at all.

Let's see if @Syllo has a better idea. (I haven't read much of the UI code of nvtop)

zhuyifei1999 avatar Apr 12 '22 06:04 zhuyifei1999

From what I see in the video, it freezes when gathering the information, every second or so (which is the default update rate). The interface freezes because everything runs in the same thread.

I do not see that behavior on my system either, even when I increase the load with more processes/fd than what @Latrolage reported. I can observe a very slight slowdown when strace is running.

This might be exacerbated on systems with many AMD GPUS, in which case we will go through /proc many times.

I will think of refactoring the /proc traversal at some point and maybe put the info gathering in its own thread, but I don't see how to avoid the fstats calls.

Syllo avatar Apr 16 '22 12:04 Syllo

This issue is also seen in my case with amd gpu. Here is my gpu and it's drivers. This problem is also seen using btop.

26:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev ef)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Radeon RX 580 ARMOR 8G OC
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

Disr0 avatar Mar 10 '24 16:03 Disr0

This issue is due to pcie_bw reads not being threaded.
pcie_bw causes a 1s sleep on each read, during which the nvtop thread stops.

https://github.com/Syllo/nvtop/issues/208

Umio-Yasuno avatar Mar 13 '24 01:03 Umio-Yasuno

@Syllo

I suggest disabling pcie_bw read for amdgpu.
pcie_bw is not supported from Vega20.

 src/extract_gpuinfo_amdgpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/extract_gpuinfo_amdgpu.c b/src/extract_gpuinfo_amdgpu.c
index 39b20b9..3de1093 100644
--- a/src/extract_gpuinfo_amdgpu.c
+++ b/src/extract_gpuinfo_amdgpu.c
@@ -366,10 +366,12 @@ static void initDeviceSysfsPaths(struct gpu_info_amdgpu *gpu_info) {
 
   // Open the PCIe bandwidth file for dynamic info gathering
   gpu_info->PCIeBW = NULL;
+  /*
   int pcieBWFD = openat(sysfsFD, "pcie_bw", O_RDONLY);
   if (pcieBWFD) {
     gpu_info->PCIeBW = fdopen(pcieBWFD, "r");
   }
+  */
 
   // Open the power cap file for dynamic info gathering
   gpu_info->powerCap = NULL;

Umio-Yasuno avatar Mar 16 '24 10:03 Umio-Yasuno