kitty
kitty copied to clipboard
[RFC] New 2x faster escape code parser using vector CPU instructions
Hi all,
I've spent the last few months working on and off on a new escape code parser for kitty, that uses vector CPU instructions (AVX2 or SSE 4.2 or their ARM equivalents, whatever is available) to greatly speed up parsing of the input byte stream. This has led to speedups with throughput for parsing different kind of input data of between 50% to 400%.
There is a new benchmarking kitten that can be used to benchmark terminal performance as well. The benchmark results from running kitten __benchmark__
on my system with kitty master branch and vt branch (the latter has the new code):
master branch:
Results:
Only ASCII chars : 4.55s @ 43.9 MB/s
Unicode chars : 2.73s @ 64.9 MB/s
CSI codes with few chars : 3.56s @ 28.1 MB/s
Long escape codes : 8.12s @ 96.5 MB/s
Images : 10.06s @ 53.0 MB/s
vt branch with SIMD acceleration:
Results:
Only ASCII chars : 1.73s @ 115.7 MB/s
Unicode chars : 1.78s @ 99.5 MB/s
CSI codes with few chars : 1.76s @ 56.7 MB/s
Long escape codes : 2.39s @ 327.9 MB/s
Images : 1.97s @ 270.8 MB/s
A table comparing the results from the vt branch with other terminal emulators I have on my system is at https://github.com/kovidgoyal/kitty/blob/vt/docs/performance.rst#throughput showing kitty is much faster than the rest with the new code.
There are of course some downsides, most importantly, the new code required fairly invasive changes to various parts of kitty so its likely some edge cases are broken. In particular kitty no longer supports input data in non utf-8 encodings and also does not support C1 control codes. The latter are supported only by VTE based terminals and WezTerm out of the box, so they are not widely used.
I'd appreciate if some of you could build kitty from source and help test things for regressions. Building kitty from source is very easy you need only C and Go compilers and to run a single command, instructions are at: https://sw.kovidgoyal.net/kitty/build/
Please report any issues you find here.
Thanks, and enjoy!
P.S. For the curious, the things that have been sped up are:
- Searching for the start and end of escape codes in the input byte stream
- UTF-8 decoding via SIMD instructions
- base64 decoding via SIMD instructions (base64 is used for various escape codes such as images, copy/paste, etc.)
Currently I have restricted the SIMD implementations to AVX2 at max, not AVX512 as that tends to have a major energy/warm up penalty and prevents running on "economy cores". Something to revisit in the future.
In what cases is this change perceivable?
On Tue, Jan 16, 2024 at 11:56:58PM -0800, Egor Zvorykin wrote:
In what cases is this change perceivable?
Depends on what you do in your terminal. Anything that requires processing lots of data fast, will be faster.
Will this improve Vim redrawing speed?
It will, indeed my original motivation for making kitty was the horrible performance of vim in the terminals at the time. However, whether you notice this particular change or not as a human, depends on your vim config, terminal window size, how sensitive you are to noticing such things, etc.
Impressive numbers! I have tried a few basic tests and it is definitely faster. Up to 30% with the craziest pixel pushing test case I tried and around 10% when just scrolling large amounts of text. Nice!
In regular use kitty is really never the bottle neck in anything I do, but it should be a nice little bit of battery/CPU saving though.
I have been using a build from the vt branch all afternoon and have not seen any kind of regression so far. 👍
This is awesome. Should this build on MacOS?
ProductName: macOS
ProductVersion: 14.2.1
BuildVersion: 23C71
I am receiving the following error:
./dev.sh build
[1/1] Compiling kitty/simd-string.c ... done
Compiling kitty/simd-string.c ...
clang -MMD -DNDEBUG -DPRIMARY_VERSION=4000 -DSECONDARY_VERSION=31 -DXT_VERSION="0.31.0" -DGL_SILENCE_DEPRECATION -I/Users/mike/go/src/github.com/kovidgoyal/kitty/dependencies/darwin-arm64/include -Wextra -Wfloat-conversion -Wno-missing-field-initializers -Wall -Wstrict-prototypes -std=c11 -pedantic-
errors -Werror -O3 -fwrapv -fstack-protector-strong -pipe -fvisibility=hidden -D_FORTIFY_SOURCE=2 -mbranch-protection=standard -fno-plt -flto -pthread -I/Users/mike/go/src/github.com/kovidgoyal/kitty/dependencies/darwin-arm64/include/libpng16 -I/Users/mike/go/src/github.com/kovidgoyal/kitty/dependen
cies/darwin-arm64/include -I/Users/mike/go/src/github.com/kovidgoyal/kitty/dependencies/darwin-arm64/include -I/var/folders/wj/cqy1nsyn4wn94d96btkldt0r0000gn/T/t/openssl,x86_64,-5kbzhdrp/include -I/Users/mike/go/src/github.com/kovidgoyal/kitty/dependencies/darwin-arm64/include/harfbuzz -I/Users/mike
/go/src/github.com/kovidgoyal/kitty/dependencies/darwin-arm64/python/Python.framework/Versions/3.9/include/python3.9 -c kitty/simd-string.c -o build/fast_data_types-kitty-simd-string.c.o
In file included from kitty/simd-string.c:13:
kitty/simd-string-impl.h:18:10: fatal error: 'simde/x86/avx2.h' file not found
#include <simde/x86/avx2.h>
^~~~~~~~~~~~~~~~~~
1 error generated.
The following build command failed: /Users/mike/go/src/github.com/kovidgoyal/kitty/dependencies/darwin-arm64/python/Python.framework/Versions/Current/bin/python3 setup.py develop
exit status 1
It will work on macOS, you just need to update your copy of the dependencies, with ./dev.sh deps
It will work on macOS, you just need to update your copy of the dependencies, with
./dev.sh deps
thanks, that worked
I'm getting broken rendering when running the vt branch on my M1 mac.
with the following zshrc
PS1="%% "
export KEYTIMEOUT=1
bindkey -v
autoload -Uz compinit
compinit
zmodload zsh/complist
zstyle ':completion:*' menu select
bindkey -M menuselect '^[' send-break
and the following kitty.conf
font_family Inconsolata Medium
bold_font Inconsolata SemiBold
italic_font Inconsolata Light
bold_italic_font Inconsolata SemiBold
font_size 20
create a window with 50x20
cells , cd into the kitty repository and ls
to somewhat fill the entire screen, then cd <Tab>
and navigate to any menu item and press escape to see some items from the ls output disappear.
https://github.com/kovidgoyal/kitty/assets/90276965/220fe2da-af1a-48f3-b772-65508ebebadc
And sometimes all output is immediately cleared on my terminal, though I was unable to make a reproducer for it.
https://github.com/kovidgoyal/kitty/assets/90276965/6256fe3f-0069-4a4b-9221-99a058692c2a
@ad-chaos Can you please run kitty with --dump-bytes and do minimal steps to reproduce the problem and post the dump alongwith the window size in cells you reproduced at. Your steps arent reproducing on my M2 mac.
@ad-chaos: Thanks, with that I can see a difference between master and vt, will investigate more when I have a moment.
@ad-chaos Fixed by https://github.com/kovidgoyal/kitty/commit/12b684edee136cc1ad63221efd0fc2d76ac8fa85
I can confirm the problem no longer exists on the latest vt branch.
@kovidgoyal I imagine this improvement would have no impact if you are in a tmux session?
On Tue, Jan 23, 2024 at 01:15:40PM -0800, Dimitar Haralanov wrote:
@kovidgoyal I imagine this improvement would have no impact if you are in a tmux session?
Speed wise, no since your performance will be limited by tmux not kitty. But kitty will still use less energy to parse whatever tmux sends it.
I'm seeing pretty high idle cpu usage on my build of this branch. It constantly hovers around 3-4% on an empty terminal prompt, even if minimized (as opposed to 0.00% for the master branch).
The number of open tabs doesn't seem to affect this cpu usage by a meaningful amount.
On Tue, Jan 23, 2024 at 09:54:16PM -0800, Carlos Esparza wrote:
I'm seeing pretty high idle cpu usage on my build of this branch. It constantly hovers around 3-4% on an empty terminal prompt (as opposed to 0.00% for the master branch).
OK, will look at it when I have a moment. Probably unrelated to the SIMD code, some of the render loop was re-written for additional speedups as well.
Love to see that you're regularly improving this!
On Tue, Jan 23, 2024 at 09:54:16PM -0800, Carlos Esparza wrote:
I'm seeing pretty high idle cpu usage on my build of this branch. It constantly hovers around 3-4% on an empty terminal prompt (as opposed to 0.00% for the master branch).
Fixed by https://github.com/kovidgoyal/kitty/commit/8c0607b8e09b97376491dd5c151d3dd04b190cd2
Everything that is TUI should benefit from vt, isn't it ?
I use bottom
with .5sec refresh rate and 8 widgets (quite enough load i think for the case), should it consumes less cpu now ?
yes.
A further 5% improvement in the benchmark with https://github.com/kovidgoyal/kitty/commit/f1d70622af7d97ace534336d714cefbf604914ba
Here's a bug: write just "⚠️" into a text file. Then opening the file with vim (9.0) or neovim (lastest master) shows what looks like a file with just one space.
On Thu, Feb 01, 2024 at 07:33:44PM -0800, Carlos Esparza wrote:
Here's a bug: write just "⚠️" into a text file. Then opening the file with vim (9.0) or neovim (lastest master) shows what looks like a file with just one space.
Nice catch: https://github.com/kovidgoyal/kitty/commit/b02f3f6231e8ff97f4066a8608c44ca29b06a2d1
Although technically, this was both a bug in the vt branch and in vim. vim doesnt realize that the character is wide because it is a default narrow character + emoji variation selector codepoint, which presumably vim doesnt support. So vim was incorrectly positioning the cursor on the second cell of the wide character and then issuing more commands, which exposed the regression in the vt branch in turn.
Heads up that the current build on vt branch is not working (pulled from repo just now). Here's what I get when I run a ./dev.sh/build:
In file included from kitty/simd-string-256.c:9:
kitty/simd-string-impl.h:260:5: error: call to undeclared function '_mm256_zeroupper'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
zero_upper();
^
kitty/simd-string-impl.h:92:20: note: expanded from macro 'zero_upper'
#define zero_upper _mm256_zeroupper
^
kitty/simd-string-impl.h:645:5: error: call to undeclared function '_mm256_zeroupper'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
zero_upper();
^
kitty/simd-string-impl.h:92:20: note: expanded from macro 'zero_upper'
#define zero_upper _mm256_zeroupper
^
2 errors generated.
The following build command failed: /....../kitty/dependencies/darwin-arm64/python/Python.framework/Versions/Current/bin/python3 setup.py develop
exit status 1```
Should be fixed by: https://github.com/kovidgoyal/kitty/commit/e1e932e56b0cddac336ceac00298bbf7b8c408a7
https://github.com/kovidgoyal/kitty/commit/40e8ade9a2f5140a124ceda76f207ee3ef9f9103 causes my bash to not load the profile.
On Sun, Feb 11, 2024 at 01:45:53AM -0800, Erik Olofsson wrote:
https://github.com/kovidgoyal/kitty/commit/40e8ade9a2f5140a124ceda76f207ee3ef9f9103 causes my bash to not load the profile.
Dont see how that's possible unless you have KITTY_RUNNING_BASH_INTEGRATION_TEST set in your environment?