xplr
xplr copied to clipboard
CPU Usage
The xlpr file manager results in intense CPU usage when the binary is executed. How can I avoid this? https://ibb.co/7j1Rm77
The version I am using is the xrpl-git from the aur.
Hey.Thanks for reporting this. I have a good idea of where the problems are and can definitely optimize this.
I've been delaying working on cleanups and optimizations to focus on the implementations. But now I think it's time I start looking into that area.
When I run your current version via cargo run --release
it still shows 50-60% CPU usage. I don't think this issue got fixed.
Reopening this.
I'll probably need to get a test suit up to measure the cpu usage in a vm environment.
I think profiling the app could help. e.g.: https://gist.github.com/KodrAus/97c92c07a90b1fdd6853654357fd557a
I tested on Ubuntu 20.04. Peeking into your code (I'm a Rust beginner) I see a lot of object cloning and threading going on.
Initial flamegraph:
CPU usage: 51%
Conclusion: We are reading the keyboard inputs wrong.
Try 1: https://github.com/sayanarijit/xplr/pull/26/commits/1bfb01140d8cf3e5933b41da01e2fbdf3a5ff133
Flamegraph:
CPU usage: 17% (for other reasons)
Conclusion: Reading keys in a blocking way partly fixes the load issue, but we get error when we invoke shell :!
[+ 73879 suspended (tty input) cargo run
Probably because bash tried to capture the tty which was being read in a blocking way.
I'll continue working on it after https://github.com/sayanarijit/xplr/issues/39 to make sure we don't break anything in order to optimize things.
Related: https://github.com/crossterm-rs/crossterm/issues/397
The suggested solution was to rewrite the codebase in async
rust which I don't think is a good idea for this project as of now since we're still in the exploring phase. async
runtime might add many limitations to how we can use and extend xplr
because I have plans for xplr
to be also used as a library, not just a bin.
So, this is my 2nd attempt to fix the issue without changing much of the code. @IcyFlamingArrow @amalic @maximbaz can you please test if this works for you?
Cranking up sleep time will definitely be helpful, but like the author in the other thread said, it could mess up rrsponsiveness.
Will give it another try later, and let you know asap.
It doesn't seem to have much impact in my machine since the sleep time should apply only for the first click. When holding the key, subsequent events should be faster.
To me it looks like a significant improvement, it dropped from 50% cpu usage in idle to 2.5%. Of course in an ideal world I would prefer to get to 0% when idle, but this is definitely a good move, and perhaps sufficient for now.
I don't feel any decrease in responsiveness (remember to build in release mode when testing, just in case).
Awesome. Thanks.
Can confirm. It uses about 2 - 2.5% CPU time, which is still more than necessary but 20 to 25 times better than 50%.
It will definitely get better as xplr
stabilizes.
Just to keep everyone in the know, a lot of CPU and performance optimizations were made. The latest version uses about 0.1% - 0.7% CPU when idle and also feels a lot snappier.
scroll coordinates: y = 1/275 (tasks), x = 1/12 (fields)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
854 root 20 0 1398192 100436 66536 S 4.3 1.3 3:24.68 Xorg
19985 sayanar+ 20 0 2822512 295844 138324 R 3.0 3.9 0:36.73 Web Content
1382 sayanar+ 20 0 2836484 277496 153912 S 2.0 3.7 0:37.69 plasmashell
314 root -51 0 0 0 0 D 1.7 0.0 0:15.99 irq/71-MSFT0002
2115 sayanar+ 20 0 15.1g 421448 148976 S 1.7 5.6 2:33.93 WebExtensions
1305 sayanar+ 20 0 2207360 181156 117912 S 1.3 2.4 0:39.46 kwin_x11
1875 sayanar+ 20 0 3985168 578012 248472 S 1.3 7.6 14:47.54 firefox
798 root 20 0 406804 20676 17344 S 1.0 0.3 0:03.14 NetworkManager
1557 sayanar+ 20 0 245224 34020 28060 S 1.0 0.4 0:16.41 ksystemstats
796 dbus 20 0 14004 7564 5152 S 0.7 0.1 0:03.24 dbus-daemon
1301 sayanar+ 20 0 1171892 79360 59388 S 0.7 1.0 0:03.48 kded5
21287 sayanar+ 20 0 1812768 97704 69684 S 0.7 1.3 0:00.80 alacritty
25 root 20 0 0 0 0 S 0.3 0.0 0:00.71 ksoftirqd/1
32 root 20 0 0 0 0 S 0.3 0.0 0:00.63 ksoftirqd/2
428 root -51 0 0 0 0 S 0.3 0.0 0:00.53 irq/78-iwlwifi:
1226 sayanar+ 20 0 658608 26148 22508 S 0.3 0.3 0:02.34 appimagelaunche
2272 sayanar+ 20 0 2819244 212652 124308 S 0.3 2.8 0:46.70 Web Content
9728 root 20 0 0 0 0 I 0.3 0.0 0:00.19 kworker/u32:2-events_unbound
21342 sayanar+ 20 0 279196 7296 4948 S 0.3 0.1 0:00.36 xplr
Although, it still has plenty of potential for more optimization.
why should xplr
use cpu when nothing going on?
here is output of strace
:
Trace of process 16030 - /usr/bin/xplr
strace: Process 16030 attached
09:18:34.294368 futex(0x7f1db55fc4c8, FUTEX_WAIT_PRIVATE, 4294967295, NULL) = 0 <0.571250>
09:18:34.865964 open("/dev/tty", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 7 <0.000112>
09:18:34.866300 fcntl(7, F_SETFD, FD_CLOEXEC) = 0 <0.000046>
09:18:34.866496 ioctl(7, TIOCGWINSZ, {ws_row=36, ws_col=117, ws_xpixel=936, ws_ypixel=504}) = 0 <0.000032>
09:18:34.866666 close(7) = 0 <0.000149>
09:18:34.867119 madvise(0x7f1db548e000, 8192, MADV_FREE) = 0 <0.000144>
09:18:34.867463 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1db54b6000 <0.000032>
09:18:34.867732 mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1db54b2000 <0.000033>
09:18:34.867901 mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1db54ae000 <0.000034>
09:18:34.868150 mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1db54a5000 <0.000113>
09:18:34.868505 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1db54ac000 <0.000110>
09:18:34.870132 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1db54a3000 <0.000143>
09:18:34.870689 mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1db5489000 <0.000076>
09:18:34.870960 munmap(0x7f1db5489000, 16384) = 0 <0.000064>
09:18:34.871163 munmap(0x7f1db54a3000, 8192) = 0 <0.000095>
09:18:34.871460 madvise(0x7f1db548e000, 8192, MADV_FREE) = 0 <0.000038>
09:18:34.871632 munmap(0x7f1db54b6000, 8192) = 0 <0.000038>
09:18:34.871756 munmap(0x7f1db54ae000, 16384) = 0 <0.000041>
09:18:34.871865 munmap(0x7f1db54ac000, 8192) = 0 <0.000032>
09:18:34.872023 munmap(0x7f1db54b2000, 16384) = 0 <0.000038>
09:18:34.872119 munmap(0x7f1db54a5000, 16384) = 0 <0.000035>
09:18:34.872338 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1db54b6000 <0.000027>
09:18:34.872623 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1db54b4000 <0.000040>
09:18:34.872830 munmap(0x7f1db54b6000, 8192) = 0 <0.000034>
09:18:34.873211 munmap(0x7f1db54b4000, 8192) = 0 <0.000036>
09:18:34.873690 write(6, "\33[", 2) = 2 <0.000054>
09:18:34.873887 write(6, "39", 2) = 2 <0.000025>
09:18:34.873989 write(6, "m", 1) = 1 <0.000104>
09:18:34.874252 write(6, "\33[", 2) = 2 <0.000032>
09:18:34.874371 write(6, "49", 2) = 2 <0.000024>
09:18:34.874464 write(6, "m", 1) = 1 <0.000021>
09:18:34.874560 write(6, "\33[", 2) = 2 <0.000022>
09:18:34.874657 write(6, "0", 1) = 1 <0.000023>
09:18:34.874751 write(6, "m", 1) = 1 <0.000022>
09:18:34.874839 write(6, "\33[?25l", 6) = 6 <0.000021>
09:18:34.874988 open("/dev/tty", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 7 <0.000044>
09:18:34.875177 fcntl(7, F_SETFD, FD_CLOEXEC) = 0 <0.000020>
09:18:34.875265 ioctl(7, TIOCGWINSZ, {ws_row=36, ws_col=117, ws_xpixel=936, ws_ypixel=504}) = 0 <0.000020>
09:18:34.875362 close(7) = 0 <0.000025>
and here xplr
constantly opens and writes something to tty
also comparing to xplr
lf uses 0% of CPU and in strace only futex
which is waiting for user input
in previous message were strace
of main process but as you can see from pic there also thread which uses CPU
strace
of this thread:
Trace of process 16035 - xplr
strace: Process 16035 attached
09:35:15.103256 epoll_pwait(3, [], 3, 10, NULL, 8) = 0 <0.010103>
09:35:15.113581 epoll_pwait(3, [], 3, 49, NULL, 8) = 0 <0.049194>
09:35:15.162911 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000030>
09:35:15.163056 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000029>
09:35:15.163169 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000017>
09:35:15.163266 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000016>
09:35:15.163349 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000015>
09:35:15.163430 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000015>
09:35:15.163511 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000013>
09:35:15.163593 epoll_pwait(3, [], 3, 49, NULL, 8) = 0 <0.049171>
09:35:15.212896 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000030>
09:35:15.213041 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000019>
09:35:15.213146 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000014>
09:35:15.213229 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000015>
09:35:15.213312 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000015>
09:35:15.213393 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000013>
09:35:15.213471 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000013>
09:35:15.213550 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000013>
09:35:15.213632 epoll_pwait(3, [], 3, 49, NULL, 8) = 0 <0.049189>
09:35:15.262941 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000019>
09:35:15.263084 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000028>
09:35:15.263197 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000019>
09:35:15.263296 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000017>
09:35:15.263392 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000016>
09:35:15.263474 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000015>
09:35:15.263555 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000013>
09:35:15.263638 epoll_pwait(3, [], 3, 49, NULL, 8) = 0 <0.049180>
09:35:15.312939 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000030>
09:35:15.313086 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000028>
09:35:15.313198 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000017>
09:35:15.313289 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000016>
09:35:15.313372 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000016>
09:35:15.313454 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000015>
09:35:15.313535 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000015>
09:35:15.313615 epoll_pwait(3, [], 3, 0, NULL, 8) = 0 <0.000013>
here also constantly happens epoll_pwait
Thanks for reporting. I see it went up again.
As for why:
- Reading inputs
- Watch pwd for modifications. When you create, delete, or rename files using another program, lf (r22) won't auto load the changes.
- Auto refresh UI every sec. It renders the ui every sec which I agree could be improved and made more reactive. I added it when we needed to manually refresh after passing each batch of messages. But now the manual refresh messages will be added to each batch automatically. So there shouldn't be any need for a service refreshing the UI now.
- but reading input should not consume much CPU
- i think for this, at least in UNIX, can be used
inotify(7)
but reading input should not consume much CPU
As you can see from the previous comments, it's due to an open crossterm issue.
i think for this, at least in UNIX, can be used inotify(7)
It's more or less the same, some simple code running on a thread. I'm not sure if it's optimal, but comparing modification times shouldn't cause any major overhead.
here is similar issue about watching directory in lf
It's more or less the same
no, it should be faster and it's not take so much CPU, because it's build in kernel
you can check this with inotify-tools:
$ inotifywait -rm /test/dir/
and watch this process in htop
and strace
Hmm wow didn't know. Regardless, as I understand inotify will make it Linux dependent.
I did a simple experiment.
- Removed the probably unnecessary
auto_refresher
here. It brought down the CPU usage by about 50%. - Commented out the
pwd_watcher::keep_watching
. No noticeable changes. - Commented out
event_reader::keep_reading
(input reader). And it reduced the CPU usage drastically. Basically removed xplr from thetop
page.
So, I'd first try to optimize reading inputs. Maybe switching the input reading logic to termion might help.
To consider, cross-platform notify library, uses inotify under the hood for Unix: https://crates.io/crates/notify
To consider, cross-platform notify library, uses inotify under the hood for Unix: https://crates.io/crates/notify
Looks good. But from the experiments in my last comment, we know that the real issue isn't with the watcher. It's the input reading mechanism. I think something changed when I last upgraded crossterm. I'll have a closer look.
NOTE: keeping this issue open to implement notify/inotify for Linux with test cases for all platforms.
This is an uneducated question, so please take it for what it is, but:
Why is polling for input needed at all? I haven't tried the Windows side of things, but can't we just wait for input? (example reader in C); After all, the application usually exits on user input / command, I thought. The problem, I assume, comes if input is read, but parsed/actions executed in a separate thread.
Edit: I just noticed just going up and down through contents of a directory uses a lot of CPU (45% when holding, vs 16% in ranger), even in a small dir (24 files). I assume xplr doesn't reload the directory on every action/move, so could all that be caused just by updating the relative item numbers or all the extra file information?