wayst
wayst copied to clipboard
Performance vs other terminals
This isnt the best way to benchmark terminal performance, but it will suffice. When I find a more reliable method, I'll post the results later. For now, this method uses:
time find / -xdev
You must first launch this command once before starting the benchmarks. The first run seems to cache alot of the files, so all runs thereafter will be accurate as long as your not stressing your IO somehow.
wayst:
0:09.04elapsed
0:08.97elapsed
0:09.07elapsed
foot:
0:05.51elapsed
0:05.09elapsed
0:05.10elapsed
alacritty:
0:03.25elapsed
0:03.22elapsed
0:03.22elapsed
So the question here is how can wayst be up to par with alacritty?
Definitely a hot discussed topic. And fairly difficult to do it right. My personal opinion: It is too early for any kind of benchmarks. 91861 still seems to classify this as alpha quality and as such it would be too early to squeeze for every bit of performance :D And regarding benchmarks, since you already mentioned foot. There are some interesting docs in that repo regarding this topic https://codeberg.org/dnkl/foot/wiki/Performance https://codeberg.org/dnkl/foot/src/branch/master/doc/benchmark.md
So the question here is how can wayst be up to par with alacritty?
It probably can't.
-
Rendering Alacritty uses a completely different strategy when redrawing. It always repaints the whole screen each frame and is optimized to do that. This makes is faster when the entire screen content changes, as in most artificial benchmarks.
In typical use cases, most of the time only a few cells are changed per frame. Wayst caches each line to a texture and tracks changes, trying to redraw as few cells as possible (you can pass the--debug-gfx
flag to see what is redrawn) prioritizing efficiency redrawing small portions of the screen rather than speed. This results in relatively low input latency. -
Multithreading Alacritty (and i'm pretty sure foot too) uses separate threads for rendering and parsing, meaning that it's interpreter can run while rendering. This is a massive advantage in benchmarks that output large amounts of data. You can try playing with the
--io-chunk-delay
timeout option to force wayst to skip frames and improve performance.
During normal use (like scrolling in vim etc.) wayst rarely spends over 0.5ms of frame time on everything other than rendering (and it's interpreter isn't well optimized) so I decided it's not worth the added complexity. -
OpenGL versions Alacritty (and most other GL terminals) use at least OpenGL 3.1 and get access to more advanced features. Right now wayst has only one renderer implementation - the gfxOpenGL21 module. I'm planning to eventually make other renderers, but only when GL21 is feature-complete.
It is too early for any kind of benchmarks. 91861 still seems to classify this as alpha quality
The rendering is still pretty naive. There are many performance improvements we can do even in GL 2.1:
- Async buffer transfers in larger chunks (this will probably help a ton since streaming vertex data is the bottleneck right now)
- More 'damage models' and better tracking
- Partial swap (swapping buffers takes surprisingly long!)
Wow, after reading all of this and the links provided, Ive gained a whole new look on terminals. Terminal rendering is a dang magic, its own art even. It also made me realize how unimportant it is that a terminal can render the "ls" output command faster than another terminal.
My personal reason for using wayst is minimalism, and small dependencies. It looks beautiful, and performs on par with other GPU accelerated terminals IMO. Most users wont notice the seconds difference in these benchmarks, and are unrealistic to actual terminal use.
Knowing this, wayst begins to truly shine. It it literally, the GPU accelerated version of st. Thats how I always thought of this project, and with how minimal it is, who could not think this. Its nothing but a complement.
Multithreading
While Alacritty's renderer and parser may live in separate threads (do they? I'm not sure), I'm fairly sure the grid is locked. I.e. only one of them can access it at a time. However, Alacritty has a separate PTY reader thread, with a fairly large input buffer. This allows it to consume (but not parse) a large amount of client output while e.g. rendering.
As for foot, it should be seen as a single-threaded application. The renderer and parser both lives in the main thread. Now, the renderer will offload rendering tasks to threads, but nothing else in foot executes while this is happening. I.e. compared to Alacritty, foot cannot consume any client output while it's rendering.
So I've been a little busy and managed to upload a vtebench-git package to the AUR. Anyone on Arch can give vtebench-git a try. I had vtebench confused with another program I maintain on the AUR, vttest, which wasn't the program I was looking for to benchmark.
https://aur.archlinux.org/packages/vtebench-git/
I also uploaded a git version of notcurses if anyone on Arch is interested.
https://aur.archlinux.org/packages/notcurses-git/
Using vtebench, I ran the "dense_cells" benchmark on the three terminals.
wayst:
dense_cells (2 samples @ 5.35 MiB): 9128ms avg (90% < 9130ms) +-2.83ms
dense_cells (2 samples @ 5.35 MiB): 9038ms avg (90% < 9045ms) +-9.9ms
dense_cells (2 samples @ 5.35 MiB): 9151.5ms avg (90% < 9189ms) +-53.03ms
foot:
dense_cells (13 samples @ 5.35 MiB): 773.46ms avg (90% < 801ms) +-34.84ms
dense_cells (14 samples @ 5.35 MiB): 760ms avg (90% < 789ms) +-22.71ms
dense_cells (13 samples @ 5.35 MiB): 770.15ms avg (90% < 795ms) +-23.66ms
alacritty:
dense_cells (17 samples @ 6.62 MiB): 614.06ms avg (90% < 627ms) +-11.81ms
dense_cells (17 samples @ 6.62 MiB): 620.94ms avg (90% < 672ms) +-29.17ms
dense_cells (17 samples @ 6.62 MiB): 622.94ms avg (90% < 646ms) +-15.58ms
I totally forgot to mention my specs this whole time. This is on my poopy laptop.