performance regression with big log
Today i tried to use pantheon terminal as primary (its ability to open dir in default file manager is amazing). I noticed, that when log growes, it begins to work slower and slower.
For test i used this command: $ time base64 $(which alacritty)
where...
$ base64 `which alacritty` | wc -l
75155
Comparing with st and alacritty and xfce4-terminal, first two do it nearly in 0.4s, xfce4-terminal - in ≈1.27s, without regression. Below are results of above time command in pantheon terminal, invoked several times after terminal reset.
real 0m2.879s user 0m0.031s sys 0m0.170s
real 0m7.013s user 0m0.032s sys 0m0.155s
real 0m10.240s user 0m0.031s sys 0m0.163s
real 0m11.058s user 0m0.038s sys 0m0.149s
At this point terminal FPS was visually between 3 and 2. Even just pressing enter (or echo something) showes result with roughly 0.4s.
cpu/gpu - intel b950 (sandybridge)
This might be happening due to the fact that pantheon-terminal keeps a log of an unlimited size, while other terminal emulators usually default to something around 10,000 lines.
Could you install gnome-system-monitor (sudo apt install gnome-system-monitor) and check swap usage when pantheon-terminal starts being noticeably slow? If swap usage goes up when pantheon-terminal becomes slow, this means your machine is running out of RAM.
If that is not the case, try the following: sudo apt install dconf-editor, then open dconf-editor, navigate to org.pantheon.terminal.settings and set scrollback-lines to something other that 1, e.g. 100000
I have xfce4-terminal with enabled infinite scroll log, but it doesn't suffer from this problem. Doesn't PT store pixmap for all log, trying to scroll it? I know, some terminals support pixel graphics, and terminalogy is such, but i did not try such feature anywhere yet, and now have nothing from enlightenment stuff.
Interesting. Could you provide a specific command that triggers the bug that we could run? alacritty is not available in elementary OS repositories.
It's all above. Instead of alacritty it may be any other binary, including libraries, archives, whose content is good to pass to base64. As for alacritty binary, it was used in proposed stress in one alacritty github issue, also dedicated to performance problem.
So, replace which alacritty with full path to any file of enough size.
I am unable to reproduce this even with much longer files. I've pumped base64 files with 20,000,000 lines into pantheon-terminal repeatedly and observed no subjective degradation.
Then I've put together a file of base64 nonsense with 75155 lines and run time cat testfile.txt repeatedly; similarly no degradation, I consistently get 0.00user 0.14system 0:00.81elapsed 17%CPU output.
I'm attaching the file I've used so the result could be replicated: testfile.txt
I'm on elementary OS Loki with pantheon-terminal version 0.4.3-0~r995+pkg50+r1~ubuntu0.4.1.1
I tried fresh git version (it was a bit lame to report for stable release, before trying git). Also rebooted, to exclude fact, that some libs or their footprint are outdated. And changed background to opaque. Problem still appears. Last result: 3,7,12,14.
Setting log size to limited value, like 1000000, doesn't affect issue. Setting xfwm4 to non-compositing mode, as well as trying to use gala instead - also no effect.
Btw, speaking about FPS, i meaned update speed for terminal area on scrolling or text typing. Other widgets, like tabbar/headerbar are ok.
Yet, on my system all vte terminals use same vte version. If it makes sence: vte-0.48.4, gtk+-3.22.19, glib-2.25.3, vala-0.36
-- And about your test example. Result it slightly worse, when doing it with base64 instead of cat (though it self looks like base64 :) ).
But simple cat from even home dir also causes performance fall:
Here is my table, for convenience made in monospace style, each string for its run, in order
real user sys
0m2.178s 0m0.000s 0m0.119s
0m3.306s 0m0.000s 0m0.096s
0m4.176s 0m0.000s 0m0.081s
0m6.312s 0m0.000s 0m0.085s
0m6.154s 0m0.000s 0m0.075s
0m6.461s 0m0.000s 0m0.070s
0m9.846s 0m0.000s 0m0.072s
0m13.529s 0m0.000s 0m0.092s
-- I made another test with blender binary - 60mb, whose base64 gives nearly 1 million lines. in xfce4-term it took 18s. In elementary term - first time was lucky - 23s; after i tried in no-compositing, then with gala wm, then again in xfwm4 composited: first took 6m13s, other two - i did not wait so long, but when i pressed Ctrl+C, it spent roughly 1,5min.
And another test - for pure draw speed, with log size set to 10. It took 13,6s - but it seems to be best for any vte-base terminal.
Well, that's weird. How about throwing science at the problem? If you could run perf before and after degradation, we'd know what exactly is taking up all that time. I'll provide instructions for elementary OS, since I don't know what distro you're on:
- Install perf:
sudo apt install linux-tools-generic-hwe-16.04 - Install https://github.com/Shnatsel/everything-dbg
- Run
sudo apt install $(everything-dbg pantheon-terminal)to fetch debugging symbols - Install https://github.com/brendangregg/FlameGraph
Then, to record where time is spent during execution:
- Start pantheon-terminal
- Run
sudo perf record --call-graph dwarf -p (pgrep pantheon-terminal) - Run the command that triggers the regression in pantheon-terminal
- Interrupt the perf command after the workload in pantheon-terminal completes
- Run
perf script | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > flamegraph.svg
That will generate an easily readable flame graph that you could share.
If you could make performance recordings before and after the degradation, we'd be able to easily see where time is spent and from there it should be easy to figure out what's going on.
I thought about profiling :) previous steps were necessary just to be sure, that problems are not in setup. I made 3 test:
- Recording all time, cating existing base64 for blender (total 84mb of text) as long as i could wait. (i build with debug support terminal, as well as closest dependencies, with deepest glib, so performance is lowered even more) :)
- cating your 5mb testfile.txt with log size 1 (to avoid perf degradation completely)
- same as 2, but after doing step 1 without recording - there i tried as short period as possible, trying to get at least 2 refreshes before to interrupt (each frame took nearly 1s)
flamegraph-alltime.svg.gz flamegraph-before.svg.gz flamegraph-after.svg.gz
Most of the time is spent in log compression/decompression and in _vte_ring_rewrap. Seeing how I cannot reproduce it on Loki with libvte 0.42.5, I suspect this is an issue either with libvte or with your window manager firing resize events excessively.
Automatic rewrapping is probably disabled on xfce4-terminal, so could you try recent gnome-terminal and see if it exhibits a similar issue? If your terminal does this, it has rewrap enabled.
Just to not wait, while i have to install gnome-terminal: i have it enabled in xfce4-terminal as well.
- I have vte built with gnutls support, could it be reason? probably, this is not so trivial, and xfdevs have a trick :)
In gnome terminal auto-wrap is enabled too
This is affecting me too on the latest Juno release (wasn't a prevalent issue on for me while using OS Loki).
Seems to only be an issue for me with long (2000+ character), non-controlled character sequences being sent to the console.
Easiest way for me to reproduce is with cat /dev/urandom | base32 -w 0. After escaping, the performance of the terminal is completely reduced.
Might also be noteworthy, at-least for me using base64 does not cause this to happen. Only base32 seems to be able to reproduce this outside of my development env.
Edit: Another observation, this seems to only affect scrolling with the scrollwheel. Scrolling by clicking the scrollbar avoids the issue.