zellij
zellij copied to clipboard
Zellij hangs with 100% CPU when opening a new pane with a very long line on screen
~ $ zellij --version
zellij 0.37.2
~ $ stty size
55 211
~ $ uname -av
Linux a-winthinkpad 5.19.0-45-generic #46~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 7 15:06:04 UTC 20 x86_64 x86_64 x86_64 GNU/Linux
~ $ gnome-terminal --version
# GNOME Terminal 3.44.0 using VTE 0.68.0 +BIDI +GNUTLS +ICU +SYSTEMD
Reproduction:
$ zellij
$ cat x.c
#include <stdio.h>
int main() { while (1) { printf("/this/is/a/quite/long/path/for/printing"); } }
$ gcc -O2 x.c
$ ./a.out
Wait for 5s then hit ctrl+c. Now attempt to open a new pane with ctrl+p, n. Zellij will be unresponsive and taking 100% of a CPU.
Yes, very annoying bug.
For watchers of this issue, my two PRs linked above improve the situation by at least 10x, more for longer lines (as some of the behavior was quadratic). Up to 10MB lines the performance is now tolerable.
If you're building for yourself, take the second PR - it includes all commits from the first.
Edit: it's now a series of three, take the third i.e. #3043 if you're building for yourself.
The three PRs have been merged and presumably will be in the next release. In the original test case, trying to split to a new pane might respond in about 5 mins (sometimes less, sometimes more). After the changes, it takes <10s for me - not fast, but at least you can slowly navigate back to the original pane and clear
or reset
to fix the issue.
I am ok with this performance for now. Not ideal, but good enough that I don't need to force close sessions - I don't mind if this issue is closed.
For future people wanting to look at this issue, I'd start by looking at reducing memory copies. If you open zellij-server/src/panes/grid.rs
and take a look at struct Grid
, based on my profiling it's my belief that the majority of the time is being taken by shifting terminal characters in the rows in lines_above
/viewport
/lines_below
. You'll be interested in transfer_rows_from_lines_above_to_viewport
, transfer_rows_from_viewport_to_lines_above
and transfer_rows_from_lines_below_to_viewport
- all of these are called from change_size
. Transferring rows out of the viewport combines canonical rows and any subsequent wrapped rows into a single canonical row. Transferring rows in to the viewport requires splitting the row based on the current width. For extremely long lines, this can be slow.
I suspect the eventual solution is to have all characters in a single block in memory, and then perform re-splitting and combining of rows by indexing into this big block of memory and shifting around the index points. This eliminates all terminal characters copies when performing row split/merge. You'll want a wrapper datastructure that does the bookkeeping for you and has a special 'index' type rather than a plain usize. The trickiest part of this is managing memory correctly to allow adding/removing characters to the grid:
- zellij (like many terminal multiplexers) configures scrollback based on lines rather than bytes [0], so you can't allocate a fixed buffer per pane [1]. So you'll need to allow resizing of this large memory block, but ideally not too often
- character removal typically happens at the beginning of the block, when the scrollback is exceeded. You really don't want to be shifting the rest of the characters whenever the scrollback is exceeded, so you need some kind of ring buffer
2a.
VecDeque
is the classic choice here, but it operates on individual elements because the split can be at any point...so returning a slice of terminal characters may involve allocation. Always allocating defeats the point of this work, so you need a wrapper around aVecDeque
that returns aCow
that only allocates if necessary. This will require reaching intoVecDeque::as_slices
, rather than using the element-based interface ofVecDeque
2b. the harder alternative if you can't achieve the performance with 2a is to use virtual memory trickery to map the same region of memory in multiple places (nice diagram from magic-buffer), so you can always take a contiguous slice to represent a row. This apparently can be done in a cross-platform way (per vmap and vmcircbuffer). Unfortunately, I don't know the quality/maintenance status of either of these crates (or any others like vmcircbuf)
[0] though honestly I think a configurable character limit upper bound for sanity reasons is reasonable - I don't think any user seriously wants 100 million character scrollback in a single pane [1] nor would you want to - always allocating the worst case is very wasteful
Finally, here's the script I was using for benchmarking/profiling for the record
Script for profiling
#!/bin/bash
set -o errexit
set -o pipefail
set -o nounset
export TMPDIR=$(pwd)/tmp
export XDG_CACHE_HOME=$(pwd)/tmp/cache
export XDG_CONFIG_HOME=$(pwd)/tmp/config
export XDG_DATA_HOME=$(pwd)/tmp/data
export XDG_RUNTIME_DIR=$(pwd)/tmp/runtime
export XDG_STATE_DIR=$(pwd)/tmp/state
zellij="./target/debug/zellij --data-dir $(pwd)/tmp"
zellij="./target/release/zellij --data-dir $(pwd)/tmp"
recreate_tmpdir() {
rm -rf $TMPDIR
mkdir -p $TMPDIR
}
if [ "$1" = zellij ]; then
shift
recreate_tmpdir
$zellij --data-dir=$TMPDIR setup --clean
elif [ "$1" = grind-zellij ]; then
shift
recreate_tmpdir
valgrind --tool=callgrind --vgdb-prefix=$(pwd)/tmp/vgdb-pipe --instr-atstart=no --collect-atstart=yes --trace-children=yes $zellij --data-dir=$TMPDIR/zellij-data setup --clean
elif [ "$1" = time-zellij ]; then
shift
#$zellij & # can't do this because it can't find a tty
mkdir -p $TMPDIR/time
export TMPDIR=$(pwd)/tmp/time
#$zellij run -- flock mylock -c 'sleep 3'
#$zellij run -- flock mylock -c 'cat longline1M'
$zellij run -- flock mylock -c 'cat longline10M'
sleep 1 && flock mylock echo 'command done'
time (
$zellij action toggle-fullscreen
$zellij action toggle-fullscreen
$zellij action close-pane
$zellij run -c -- true
)
#$zellij action close-pane &
#wait %1
#kill %2
#time ($zellij run -c -- flock mylock -c 'sleep 3' && sleep 1 && flock mylock true)
#time ($zellij run -c -- flock mylock -c 'sleep 3' && sleep 1 && flock mylock true)
elif [ "$1" = start-grind ]; then
shift
onoff=$1
if [ "$onoff" != on -a "$onoff" != off ]; then
echo pass on or off
exit 1
fi
pid=$(
callgrind_control --vgdb-prefix=$(pwd)/tmp/vgdb-pipe |
grep -- 'zellij --server' |
sed 's/^PID \([0-9]*\):.*/\1/g'
)
echo "Found server at $pid"
callgrind_control --vgdb-prefix=$(pwd)/tmp/vgdb-pipe -i $onoff $pid
fi