zellij
zellij copied to clipboard
Improve scrollbuffer performance
Summary
This PR improves Zellij performance by up to ~145x for scrollbuffer search operations through multiple optimizations, making it usable with millions of lines in the scrollbuffer.
Changes
- Replace Vec with VecDeque for
lines_below- Eliminates O(n^2) vector copying due to insertion at the start - Optimize
split_to_rows_of_length- reduce memory allocations by checking for separation index first and then usingsplit_off - Convert search needle to
Vec<char>- improves search performance as currently we convert it toVec<char>for each row separately - Use
Vecfor row columns - improvessplit_to_rows_of_lengthperformance - Check if needle exists in the buffer before moving viewport - Avoids very expensive moving operation for the case of search string not being in the scrollbuffer
- Use
.width_cached()when calculating scrollbuffer length - This operation is very expensive for large buffers, the cache is also used in search so the changes work together - Remove regex from line truncation logic - building a regex engine with a capture group for every line was very expensive
Description
I recently switched from tmux to Zellij in my day to day workflow but at some point I had to perform search on the scrollbuffer which effectively froze my session. I often need to look through thousands or millions lines of logs so this was a deal breaker. Out of curiosity though I profiled Zellij with cargo flamegraph and found that the vast majority of the time is spent in vector insertion to lines_below, which happens when moving the viewport when performing search. The current approach of using Vec means that on each iteration (the number of lines) we copy the whole vector (up to the number of lines) to shift it by one element. Changing to VecDeque yielded an order of magnitude greater performance and scaling, this is the first commit.
After that I looked a bit deeper into the performance and found a few more bottlenecks which I addressed in later commits.
Most of those are simple but the next larger one is the one that adds searching the buffer before moving viewport. I noticed that we can improve performance of searching for nonexistent strings in buffer by first checking if it's there as moving the whole viewport millions of times back and forth is very expensive. This adds some cost to the case of the string actually being in the buffer but I think it's worth it as it's very noticeable when actually using Zellij and typing one character at a time.
Other than that I also checked what slows down resizing panes and dumping the buffer into an editor and I applied 2 small improvements that yield a huge difference when using interactively (I didn't find a simple way to measure it in criterion but I invite you to test it yourself).
To summarize, the changes result in the best case of ~145x speedup when searching through the scrollbuffer as well as major speedups when resizing panes and dumping screen to editor. I'm now able to use Zellij with the scrollbuffer length set to 10_000_000 compared to the default 10_000.
I'm attaching the benchmark result of each commit affecting scroll/search using criterion-rs as well as the patch that adds the benchmark for reproducibility.
Results
| Title | search_up_down_10000 | search_nonexistent_10000 | scroll_up_down_10000 | scroll_up_down_resize_10000 | search_up_down_50000 | search_nonexistent_50000 | scroll_up_down_50000 | scroll_up_down_resize_50000 |
|---|---|---|---|---|---|---|---|---|
| main | 75.906 ms | 76.836 ms | 44.312 ms | 49.855 ms | 1.8612 s | 1.8607 s | 1.0572 s | 1.3009 s |
lines_below to VecDeque |
22.910 ms (−70.069%) | 23.606 ms (−69.251%) | 14.771 ms (−67.434%) | 12.946 ms (−74.777%) | 134.63 ms (−92.877%) | 133.69 ms (−92.850%) | 84.880 ms (−92.047%) | 77.886 ms (−94.059%) |
Optimize split_rows |
8.8096 ms (−61.549%) | 9.1905 ms (−61.039%) | 3.2742 ms (−77.793%) | 3.0612 ms (−76.494%) | 53.413 ms (−60.326%) | 54.580 ms (−59.175%) | 24.007 ms (−71.885%) | 27.580 ms (−64.457%) |
needle to Vec<char> |
7.6804 ms (−11.903%) | 7.6053 ms (−18.393%) | 3.2618 ms (+2.7840%) | 2.9833 ms (−0.8761%) | 47.006 ms (−11.995%) | 46.628 ms (−14.569%) | 24.278 ms (+2.0716%) | 27.143 ms (−1.8064%) |
Row columns Vec |
6.1644 ms (−20.327%) | 6.5310 ms (−13.480%) | 2.6892 ms (−20.589%) | 2.4158 ms (−20.219%) | 41.347 ms (−12.039%) | 41.078 ms (−11.904%) | 21.573 ms (−11.384%) | 24.023 ms (−11.311%) |
| Buffer check | 8.6934 ms (+41.420%) | 1.1766 ms (−81.828%) | 2.5617 ms (−2.7890%) | 2.3870 ms (−1.5338%) | 66.928 ms (+61.868%) | 11.845 ms (−71.243%) | 21.615 ms (+0.4972%) | 24.175 ms (+0.8149%) |
Resize width_cached() |
7.9386 ms (−10.832%) | 1.0203 ms (−13.914%) | 2.6246 ms (+2.0906%) | 2.3938 ms (+0.5943%) | 62.561 ms (−6.5252%) | 11.567 ms (−2.3464%) | 22.153 ms (+1.6318%) | 22.506 ms (−7.3558%) |
| Total | 9.1388 ms (−87.916%) | 1.3683 ms (−98.214%) | 2.6254 ms (−94.062%) | 2.2742 ms (−95.446%) | 67.443 ms (−96.376%) | 12.749 ms (−99.314%) | 21.821 ms (−97.950%) | 22.385 ms (−98.265%) |
I very much appreciate the work you put into this. Right now I'm working on refactoring that whole area to separate the viewport logic from Grid and changing the internal representation to a flat one dimensional vector with rows/scrollback being represented as indices into that vector. Which should fully address all performance problems.
I don't think it would be a good idea to take this intermediate step, as it would provide less bang and complicate things for the refactor. If my efforts become stalled or I hit some sort of wall, I'd be happy to look into this again.
Understood, glad to hear that this part of the user experience is being worked on.
Would you be open to a separate PR with a subset of the changes? I believe the following small changes would not conflict with your refactor to the core data representation of the buffer but improve performance nonetheless:
- 9b55251d7f54fe161c0488200011a0de50fb6168 -
dump_screenoptimization - big improvement when editing buffer in editor - 27fc577b5e45757e442d9de791ded3d692a830e2 - search optimization - small change but noticeable improvement in search performance on large buffers
- e7301426b283baef445c85da3a3a066681551c3e -
width_cached()when recalculating scrollback - big improvement in resize performance - 8439bbff4761c89f765b1a8bf1e65031cdb278ec - Row column type to
Vec- big improvements all around
I understand that changes to the lines_below data structure would make no sense after the refactor, but if split_to_rows_of_length is supposed to be used after the changes then a1e6f9ae9e43ddff071eee4a27f571dd59fbf20a could also be merged.
Hi @DawidPietrykowski and thanks for your patience. I would very much rather work toward the proper fix for this entire area of code rather than adding small fixes to the current implementation - even if they do improve performance as I'm sure this fix does (the numbers look great!)
This area of the code is unfortunately quite patchy and I fear that the more we add to it, the farther it will take us from cleaning it up. What needs to be done is to separate the viewport struct from the grid struct entirely - namely moving the "viewport", "lines_above" and "lines_below" to their own struct and having the grid interact with it. I started working on it, but this is quite an arduous task. If this is something you'd be interested in working on, I will be willing to take on reviewing and guiding you through this (as long as you will be willing to commit to the long-term, because this will be a lot of work).
Meanwhile regarding search performance, I heartily recommend using the built-in method of searching through the scrollback with your $EDITOR (Ctrl s + e by default). While this strips the ANSI away, it should be very performant (editor-level performance).
Let me know if you'd like to take up this task.
Hi @imsnif, scrollbuffer/search performance is the only thing I wish would be improved in Zellij so I'd gladly take on the task on rewriting the code responsible for managing it.
I would start by preparing a general idea of the implementation. I'll draft a PR based on your proposed approach and get back to you.
Closing this PR as the rewrite will be submitted separately.