kitty icon indicating copy to clipboard operation
kitty copied to clipboard

[RFC] New 2x faster escape code parser using vector CPU instructions

Open kovidgoyal opened this issue 5 months ago • 52 comments

Hi all,

I've spent the last few months working on and off on a new escape code parser for kitty, that uses vector CPU instructions (AVX2 or SSE 4.2 or their ARM equivalents, whatever is available) to greatly speed up parsing of the input byte stream. This has led to speedups with throughput for parsing different kind of input data of between 50% to 400%.

There is a new benchmarking kitten that can be used to benchmark terminal performance as well. The benchmark results from running kitten __benchmark__ on my system with kitty master branch and vt branch (the latter has the new code):

master branch:

Results:
  Only ASCII chars         : 4.55s      @ 43.9    MB/s
  Unicode chars            : 2.73s      @ 64.9    MB/s
  CSI codes with few chars : 3.56s      @ 28.1    MB/s
  Long escape codes        : 8.12s      @ 96.5    MB/s
  Images                   : 10.06s     @ 53.0    MB/s

vt branch with SIMD acceleration:

Results:
  Only ASCII chars         : 1.73s      @ 115.7   MB/s
  Unicode chars            : 1.78s      @ 99.5    MB/s
  CSI codes with few chars : 1.76s      @ 56.7    MB/s
  Long escape codes        : 2.39s      @ 327.9   MB/s
  Images                   : 1.97s      @ 270.8   MB/s

A table comparing the results from the vt branch with other terminal emulators I have on my system is at https://github.com/kovidgoyal/kitty/blob/vt/docs/performance.rst#throughput showing kitty is much faster than the rest with the new code.

There are of course some downsides, most importantly, the new code required fairly invasive changes to various parts of kitty so its likely some edge cases are broken. In particular kitty no longer supports input data in non utf-8 encodings and also does not support C1 control codes. The latter are supported only by VTE based terminals and WezTerm out of the box, so they are not widely used.

I'd appreciate if some of you could build kitty from source and help test things for regressions. Building kitty from source is very easy you need only C and Go compilers and to run a single command, instructions are at: https://sw.kovidgoyal.net/kitty/build/

Please report any issues you find here.

Thanks, and enjoy!

P.S. For the curious, the things that have been sped up are:

  1. Searching for the start and end of escape codes in the input byte stream
  2. UTF-8 decoding via SIMD instructions
  3. base64 decoding via SIMD instructions (base64 is used for various escape codes such as images, copy/paste, etc.)

Currently I have restricted the SIMD implementations to AVX2 at max, not AVX512 as that tends to have a major energy/warm up penalty and prevents running on "economy cores". Something to revisit in the future.

kovidgoyal avatar Jan 17 '24 03:01 kovidgoyal

In what cases is this change perceivable?

EgZvor avatar Jan 17 '24 07:01 EgZvor

On Tue, Jan 16, 2024 at 11:56:58PM -0800, Egor Zvorykin wrote:

In what cases is this change perceivable?

Depends on what you do in your terminal. Anything that requires processing lots of data fast, will be faster.

kovidgoyal avatar Jan 17 '24 09:01 kovidgoyal

Will this improve Vim redrawing speed?

EgZvor avatar Jan 17 '24 09:01 EgZvor

It will, indeed my original motivation for making kitty was the horrible performance of vim in the terminals at the time. However, whether you notice this particular change or not as a human, depends on your vim config, terminal window size, how sensitive you are to noticing such things, etc.

kovidgoyal avatar Jan 17 '24 09:01 kovidgoyal

Impressive numbers! I have tried a few basic tests and it is definitely faster. Up to 30% with the craziest pixel pushing test case I tried and around 10% when just scrolling large amounts of text. Nice!

In regular use kitty is really never the bottle neck in anything I do, but it should be a nice little bit of battery/CPU saving though.

I have been using a build from the vt branch all afternoon and have not seen any kind of regression so far. 👍

neurocyte avatar Jan 17 '24 14:01 neurocyte

This is awesome. Should this build on MacOS?

ProductName: macOS
ProductVersion:	14.2.1 
BuildVersion: 23C71

I am receiving the following error:

./dev.sh build
[1/1] Compiling kitty/simd-string.c ... done
Compiling kitty/simd-string.c ...
clang -MMD -DNDEBUG -DPRIMARY_VERSION=4000 -DSECONDARY_VERSION=31 -DXT_VERSION="0.31.0" -DGL_SILENCE_DEPRECATION -I/Users/mike/go/src/github.com/kovidgoyal/kitty/dependencies/darwin-arm64/include -Wextra -Wfloat-conversion -Wno-missing-field-initializers -Wall -Wstrict-prototypes -std=c11 -pedantic-
errors -Werror -O3 -fwrapv -fstack-protector-strong -pipe -fvisibility=hidden -D_FORTIFY_SOURCE=2 -mbranch-protection=standard -fno-plt -flto -pthread -I/Users/mike/go/src/github.com/kovidgoyal/kitty/dependencies/darwin-arm64/include/libpng16 -I/Users/mike/go/src/github.com/kovidgoyal/kitty/dependen
cies/darwin-arm64/include -I/Users/mike/go/src/github.com/kovidgoyal/kitty/dependencies/darwin-arm64/include -I/var/folders/wj/cqy1nsyn4wn94d96btkldt0r0000gn/T/t/openssl,x86_64,-5kbzhdrp/include -I/Users/mike/go/src/github.com/kovidgoyal/kitty/dependencies/darwin-arm64/include/harfbuzz -I/Users/mike
/go/src/github.com/kovidgoyal/kitty/dependencies/darwin-arm64/python/Python.framework/Versions/3.9/include/python3.9 -c kitty/simd-string.c -o build/fast_data_types-kitty-simd-string.c.o
In file included from kitty/simd-string.c:13:
kitty/simd-string-impl.h:18:10: fatal error: 'simde/x86/avx2.h' file not found
#include <simde/x86/avx2.h>
         ^~~~~~~~~~~~~~~~~~
1 error generated.
The following build command failed: /Users/mike/go/src/github.com/kovidgoyal/kitty/dependencies/darwin-arm64/python/Python.framework/Versions/Current/bin/python3 setup.py develop
exit status 1

mikesmithgh avatar Jan 18 '24 21:01 mikesmithgh

It will work on macOS, you just need to update your copy of the dependencies, with ./dev.sh deps

kovidgoyal avatar Jan 19 '24 02:01 kovidgoyal

It will work on macOS, you just need to update your copy of the dependencies, with ./dev.sh deps

thanks, that worked

mikesmithgh avatar Jan 19 '24 02:01 mikesmithgh

I'm getting broken rendering when running the vt branch on my M1 mac.

with the following zshrc

PS1="%% "

export KEYTIMEOUT=1
bindkey -v

autoload -Uz compinit
compinit
zmodload zsh/complist

zstyle ':completion:*' menu select
bindkey -M menuselect '^[' send-break

and the following kitty.conf

font_family      Inconsolata Medium
bold_font        Inconsolata SemiBold
italic_font      Inconsolata Light
bold_italic_font Inconsolata SemiBold
font_size 20

create a window with 50x20 cells , cd into the kitty repository and ls to somewhat fill the entire screen, then cd <Tab> and navigate to any menu item and press escape to see some items from the ls output disappear.

https://github.com/kovidgoyal/kitty/assets/90276965/220fe2da-af1a-48f3-b772-65508ebebadc

And sometimes all output is immediately cleared on my terminal, though I was unable to make a reproducer for it.

https://github.com/kovidgoyal/kitty/assets/90276965/6256fe3f-0069-4a4b-9221-99a058692c2a

ad-chaos avatar Jan 19 '24 16:01 ad-chaos

@ad-chaos Can you please run kitty with --dump-bytes and do minimal steps to reproduce the problem and post the dump alongwith the window size in cells you reproduced at. Your steps arent reproducing on my M2 mac.

kovidgoyal avatar Jan 19 '24 16:01 kovidgoyal

dumped-bytes.log

The window size was 50x20 cells

ad-chaos avatar Jan 19 '24 16:01 ad-chaos

@ad-chaos: Thanks, with that I can see a difference between master and vt, will investigate more when I have a moment.

kovidgoyal avatar Jan 19 '24 16:01 kovidgoyal

@ad-chaos Fixed by https://github.com/kovidgoyal/kitty/commit/12b684edee136cc1ad63221efd0fc2d76ac8fa85

kovidgoyal avatar Jan 20 '24 04:01 kovidgoyal

I can confirm the problem no longer exists on the latest vt branch.

ad-chaos avatar Jan 20 '24 06:01 ad-chaos

@kovidgoyal I imagine this improvement would have no impact if you are in a tmux session?

dalizard avatar Jan 23 '24 21:01 dalizard

On Tue, Jan 23, 2024 at 01:15:40PM -0800, Dimitar Haralanov wrote:

@kovidgoyal I imagine this improvement would have no impact if you are in a tmux session?

Speed wise, no since your performance will be limited by tmux not kitty. But kitty will still use less energy to parse whatever tmux sends it.

kovidgoyal avatar Jan 24 '24 01:01 kovidgoyal

I'm seeing pretty high idle cpu usage on my build of this branch. It constantly hovers around 3-4% on an empty terminal prompt, even if minimized (as opposed to 0.00% for the master branch).

The number of open tabs doesn't seem to affect this cpu usage by a meaningful amount.

ces42 avatar Jan 24 '24 05:01 ces42

On Tue, Jan 23, 2024 at 09:54:16PM -0800, Carlos Esparza wrote:

I'm seeing pretty high idle cpu usage on my build of this branch. It constantly hovers around 3-4% on an empty terminal prompt (as opposed to 0.00% for the master branch).

OK, will look at it when I have a moment. Probably unrelated to the SIMD code, some of the render loop was re-written for additional speedups as well.

kovidgoyal avatar Jan 24 '24 14:01 kovidgoyal

Love to see that you're regularly improving this!

p7r0x7 avatar Jan 24 '24 19:01 p7r0x7

On Tue, Jan 23, 2024 at 09:54:16PM -0800, Carlos Esparza wrote:

I'm seeing pretty high idle cpu usage on my build of this branch. It constantly hovers around 3-4% on an empty terminal prompt (as opposed to 0.00% for the master branch).

Fixed by https://github.com/kovidgoyal/kitty/commit/8c0607b8e09b97376491dd5c151d3dd04b190cd2

kovidgoyal avatar Jan 25 '24 09:01 kovidgoyal

Fixed by 8c0607b

can confirm!

ces42 avatar Jan 25 '24 17:01 ces42

Everything that is TUI should benefit from vt, isn't it ? I use bottom with .5sec refresh rate and 8 widgets (quite enough load i think for the case), should it consumes less cpu now ?

up-to-you avatar Jan 26 '24 21:01 up-to-you

yes.

kovidgoyal avatar Jan 27 '24 03:01 kovidgoyal

A further 5% improvement in the benchmark with https://github.com/kovidgoyal/kitty/commit/f1d70622af7d97ace534336d714cefbf604914ba

kovidgoyal avatar Feb 01 '24 13:02 kovidgoyal

Here's a bug: write just "⚠️" into a text file. Then opening the file with vim (9.0) or neovim (lastest master) shows what looks like a file with just one space.

ces42 avatar Feb 02 '24 03:02 ces42

On Thu, Feb 01, 2024 at 07:33:44PM -0800, Carlos Esparza wrote:

Here's a bug: write just "⚠️" into a text file. Then opening the file with vim (9.0) or neovim (lastest master) shows what looks like a file with just one space.

Nice catch: https://github.com/kovidgoyal/kitty/commit/b02f3f6231e8ff97f4066a8608c44ca29b06a2d1

Although technically, this was both a bug in the vt branch and in vim. vim doesnt realize that the character is wide because it is a default narrow character + emoji variation selector codepoint, which presumably vim doesnt support. So vim was incorrectly positioning the cursor on the second cell of the wide character and then issuing more commands, which exposed the regression in the vt branch in turn.

kovidgoyal avatar Feb 02 '24 10:02 kovidgoyal

Heads up that the current build on vt branch is not working (pulled from repo just now). Here's what I get when I run a ./dev.sh/build:

In file included from kitty/simd-string-256.c:9:
kitty/simd-string-impl.h:260:5: error: call to undeclared function '_mm256_zeroupper'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    zero_upper();
    ^
kitty/simd-string-impl.h:92:20: note: expanded from macro 'zero_upper'
#define zero_upper _mm256_zeroupper
                   ^
kitty/simd-string-impl.h:645:5: error: call to undeclared function '_mm256_zeroupper'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    zero_upper();
    ^
kitty/simd-string-impl.h:92:20: note: expanded from macro 'zero_upper'
#define zero_upper _mm256_zeroupper
                   ^
2 errors generated.
The following build command failed: /....../kitty/dependencies/darwin-arm64/python/Python.framework/Versions/Current/bin/python3 setup.py develop
exit status 1```

brandonfouts avatar Feb 05 '24 03:02 brandonfouts

Should be fixed by: https://github.com/kovidgoyal/kitty/commit/e1e932e56b0cddac336ceac00298bbf7b8c408a7

kovidgoyal avatar Feb 05 '24 05:02 kovidgoyal

https://github.com/kovidgoyal/kitty/commit/40e8ade9a2f5140a124ceda76f207ee3ef9f9103 causes my bash to not load the profile.

erikolofsson avatar Feb 11 '24 09:02 erikolofsson

On Sun, Feb 11, 2024 at 01:45:53AM -0800, Erik Olofsson wrote:

https://github.com/kovidgoyal/kitty/commit/40e8ade9a2f5140a124ceda76f207ee3ef9f9103 causes my bash to not load the profile.

Dont see how that's possible unless you have KITTY_RUNNING_BASH_INTEGRATION_TEST set in your environment?

kovidgoyal avatar Feb 11 '24 09:02 kovidgoyal