delta
delta copied to clipboard
π delta is slow (noticeable lag) compared to default diff (11ms -> 76ms) in VSCode
I am using Ubuntu (WSL on Windows). The default git diff is immediate, while delta has a noticeable lag. As a rule of thumb, anything under 30ms feels immediate, but over 30ms you will notice a lag.
I don't know if the issue is with delta itself, or if this extra 65ms overhead is caused by launching the delta program from git, but unfortunately that makes it too slow for me, since I want these command line tools to feel immediate, the extra lag distracts me.
In VSCode console, default git diff:
$ time git diff
diff --git a/src/lfortran/parser/preprocessor.re b/src/lfortran/parser/preprocessor.re
index 1f39b3c58..67643a220 100644
--- a/src/lfortran/parser/preprocessor.re
+++ b/src/lfortran/parser/preprocessor.re
@@ -433,7 +433,7 @@ std::string CPreprocessor::run(const std::string &input, LocationManager &lm,
interval_end_type_0(lm, output.size(), cur-string_start);
continue;
}
- "#" whitespace? "include" whitespace '"' @t1 [^"\x00]* @t2 '"' [^\n\x00]* newline {
+ "#" whitespace? "include" whitespace ('"' | '<') @t1 [^"\x00]* @t2 ('"' | '>') [^\n\x00]* newline {
if (!branch_enabled) continue;
std::string filename = token(t1, t2);
std::vector<std::filesystem::path> include_dirs;
real 0m0.011s
user 0m0.013s
sys 0m0.000s
And delta:
$ time git diff
Ξ src/lfortran/parser/preprocessor.re
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β’ 433: std::string CPreprocessor::run(const std::string &input, LocationManager &lm, β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
interval_end_type_0(lm, output.size(), cur-string_start);
continue;
}
"#" whitespace? "include" whitespace '"' @t1 [^"\x00]* @t2 '"' [^\n\x00]* newline {
"#" whitespace? "include" whitespace ('"' | '<') @t1 [^"\x00]* @t2 ('"' | '>') [^\n\x00]* newline {
if (!branch_enabled) continue;
std::string filename = token(t1, t2);
std::vector<std::filesystem::path> include_dirs;
real 0m0.076s
user 0m0.034s
sys 0m0.004s
I also tried the same thing directly in Terminal, and there I get 8ms with git diff and 27ms with delta, which is still a huge overhead, but fortunately it is under 30ms, and so it feels immediate.
So there are two separate issues:
- large overhead of
deltacompared to nativegit diff(3x slower in Terminal, 6x slower in VSCode) - VSCode seems tiny bit slower for
git diff, but a lot slower fordelta
If you have any tips that I could try, let me know, I am happy to help debug.
Hi @certik,
It could well be time to do some profiling and optimization. I haven't profiled properly but from very quick ad-hoc experimenting it looks like I'm getting results that are similar to yours in some ways:
This is on a MacOS M2, testing a one-line diff via git diff like you:
| Without Delta | With Delta | |
|---|---|---|
| Alacritty | ~20ms | ~35-40ms |
| VSCode | ~20ms | ~65-70ms |
So at minimum, it looks like we have one concrete question: why is delta slower in VSCode? Perhaps simply because delta emits more ANSI escape sequences and the VSCode terminal isn't performing as well on them as other terminal emulators?
@dandavison awesome, I am glad you can reproduce it on macOS also.
why is delta slower in VSCode? Perhaps simply because delta emits more ANSI escape sequences and the VSCode terminal isn't performing as well on them as other terminal emulators?
Let's figure it out: we can test this hypothesis by making delta emit output without ANSI escape sequences, to either confirm or rule it out. This would be useful even for Alacritty / Terminal to see if they get faster.
Note: I installed delta using conda-forge, here is the build command: https://github.com/conda-forge/git-delta-feedstock/blob/565fea2fbdf4aea4f40a88d260445be75b3b7d62/recipe/build.sh#L13, I think this builds with Rust in Release mode (enables all optimizations)?
Here's another experiment. This experiment does not involve the terminal emulator rendering anything, and it doesn't involve git invoking delta; instead we invoke delta explicitly via a shell pipe. This seems to suggest that (a) delta's execution costs around 6ms and (b) much of the total execution time of delta is due to terminal emulator activity, especially in VSCode.
(You probably know this, but beware of things like git diff > /dev/null -- git will not invoke delta even if it's configured to do so.)
Alacritty
$ hyperfine --warmup 500 'git diff | delta > /dev/null'
Benchmark 1: git diff | delta > /dev/null
Time (mean Β± Ο): 14.2 ms Β± 2.4 ms [User: 9.5 ms, System: 7.7 ms]
Range (min β¦ max): 10.0 ms β¦ 21.1 ms 117 runs
$ hyperfine --warmup 500 'git diff > /dev/null'
Benchmark 1: git diff > /dev/null
Time (mean Β± Ο): 8.7 ms Β± 2.0 ms [User: 3.4 ms, System: 3.3 ms]
Range (min β¦ max): 5.6 ms β¦ 14.9 ms 146 runs
VSCode
$ hyperfine --warmup 500 'git diff | delta > /dev/null'
Benchmark 1: git diff | delta > /dev/null
Time (mean Β± Ο): 14.9 ms Β± 2.8 ms [User: 10.3 ms, System: 7.5 ms]
Range (min β¦ max): 10.1 ms β¦ 30.1 ms 109 runs
$ hyperfine --warmup 500 'git diff > /dev/null'
Benchmark 1: git diff > /dev/null
Time (mean Β± Ο): 8.0 ms Β± 3.2 ms [User: 3.4 ms, System: 3.6 ms]
Range (min β¦ max): 2.8 ms β¦ 21.6 ms 132 runs
@dandavison thanks for the benchmarks, this looks very promising. It looks like both in VSCode and Alacritty git diff takes 8ms, and delta takes additional 6ms or so, so 14ms total. I guess it depends on the diff size too, but this is completely usable.
Somehow there is a large penalty that happens after delta is done, but it's weird. Terminals in my experience are slow compared to how fast they could be, but they are not that slow: for small output like I did it should not take 20 - 50ms to render.
In WSL + Terminal on another (slower) laptop, on a larger (several pages) diff:
$ hyperfine --warmup 50 'git diff > /dev/null'
Benchmark 1: git diff > /dev/null
Time (mean Β± Ο): 16.4 ms Β± 3.2 ms [User: 8.1 ms, System: 10.4 ms]
Range (min β¦ max): 11.1 ms β¦ 28.9 ms 165 runs
$ hyperfine --warmup 50 'git diff | delta > /dev/null'
Benchmark 1: git diff | delta > /dev/null
Time (mean Β± Ο): 200.7 ms Β± 20.6 ms [User: 57.1 ms, System: 28.6 ms]
Range (min β¦ max): 177.5 ms β¦ 237.6 ms 16 runs
I then did the simplest one line diff:
$ hyperfine --warmup 50 'git diff > /dev/null'
Benchmark 1: git diff > /dev/null
Time (mean Β± Ο): 12.9 ms Β± 2.4 ms [User: 5.7 ms, System: 9.2 ms]
Range (min β¦ max): 8.5 ms β¦ 23.4 ms 168 runs
$ hyperfine --warmup 50 'git diff | delta > /dev/null'
Benchmark 1: git diff | delta > /dev/null
Time (mean Β± Ο): 180.6 ms Β± 12.3 ms [User: 25.7 ms, System: 21.6 ms]
Range (min β¦ max): 162.9 ms β¦ 205.7 ms 17 runs
We can then time everything via:
$ time git diff
diff --git a/build1.sh b/build1.sh
index 65c7ea56c..5ae61dbd2 100755
--- a/build1.sh
+++ b/build1.sh
@@ -14,4 +14,4 @@ cmake \
-DCMAKE_INSTALL_PREFIX=`pwd`/inst \
-DCMAKE_INSTALL_LIBDIR=share/lfortran/lib \
.
-cmake --build . -j16 --target install
+#cmake --build . -j16 --target install
real 0m0.015s
user 0m0.008s
sys 0m0.009s
$ time git diff | delta
build1.sh
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββ
14: cmake \ β
βββββββββββββ
-DCMAKE_INSTALL_PREFIX=`pwd`/inst \
-DCMAKE_INSTALL_LIBDIR=share/lfortran/lib \
.
cmake --build . -j16 --target install
#cmake --build . -j16 --target install
real 0m0.203s
user 0m0.034s
sys 0m0.024s
I ran it couple times and took the smallest number. It seems git diff takes 15ms, delta takes 160ms and the terminal takes 20ms, or so. I also tried VSCode, but I am getting similar numbers there.
I then did the simplest one line diff:
Your results seem to be showing delta taking ~185ms to compute a multi-page diff and then ~170ms to compute a one-line diff. That can't be right? Or is that laptop just very slow to start delta?
seems git diff takes 15ms, delta takes 160m
Can you post the diff which delta a long time on?
So, in summary do you think there's any delta development work indicated here? Or can we conclude that delta is fast enough, but some terminal emulators are slow at rendering its output?
Let's dig deeper to answer your questions. I am using https://github.com/lfortran/lfortran/ in WSL on a Surface 5 laptop. Here is the git diff:
$ time git diff
diff --git a/README.md b/README.md
index 166c893ba..34f093214 100644
--- a/README.md
+++ b/README.md
@@ -4,6 +4,7 @@
LFortran is a modern open-source (BSD licensed) interactive Fortran compiler
built on top of LLVM. It can execute user's code interactively to allow
+
exploratory work (much like Python, MATLAB or Julia) as well as compile to
binaries with the goal to run user's code on modern architectures such as
multi-core CPUs and GPUs.
real 0m0.016s
user 0m0.004s
sys 0m0.013s
Here is delta with various options:
$ time git diff | delta
README.md
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββ
4: β
ββββ
LFortran is a modern open-source (BSD licensed) interactive Fortran compiler
built on top of LLVM. It can execute user's code interactively to allow
exploratory work (much like Python, MATLAB or Julia) as well as compile to
binaries with the goal to run user's code on modern architectures such as
multi-core CPUs and GPUs.
real 0m0.247s
user 0m0.034s
sys 0m0.024s
I ran it couple times. I then ran:
$ time git diff | delta --color-only
diff --git a/README.md b/README.md
index 166c893ba..34f093214 100644
--- a/README.md
+++ b/README.md
@@ -4,6 +4,7 @@
LFortran is a modern open-source (BSD licensed) interactive Fortran compiler
built on top of LLVM. It can execute user's code interactively to allow
+
exploratory work (much like Python, MATLAB or Julia) as well as compile to
binaries with the goal to run user's code on modern architectures such as
multi-core CPUs and GPUs.
real 0m0.045s
user 0m0.052s
sys 0m0.007s
That's much better!
I wonder if Windows takes forever to launch the program for some reason?
I ran it many times by hand, the fastest I was able to get delta is 39ms (git diff is 16ms, so delta takes 23ms). Unfortunately that is still too slow, it must get below 30ms.
To go from here, I could write a simple prototype that just colors the output a bit, and see if it can run faster. I would expect 6ms the most for simple diffs like that, not 23ms.
I wonder if Windows takes forever to launch the program for some reason?
@th1000s may have thoughts for this thread.
I wonder whether it's some I/O that's being done at start up time. Can you try with --no-gitconfig? I believe that will prevent any attempt to read gitconfig files from disk.
Is it worth investigating whether not doing the calling process detection in https://github.com/dandavison/delta/blob/main/src/utils/process.rs#L67-L78 changes timings? @certik are you able to modify the Rust code and try things like that out?
I haven't seen a difference with --no-gitconfig.
The first time it runs it's always ~200ms, then it runs faster, but only sometimes. For example hyperfine is still slow:
$ hyperfine --warmup 50 'git diff | delta > /dev/null'
Benchmark 1: git diff | delta > /dev/null
Time (mean Β± Ο): 211.2 ms Β± 31.1 ms [User: 30.4 ms, System: 34.5 ms]
Range (min β¦ max): 160.3 ms β¦ 278.1 ms 15 runs
$ time git diff | delta > /dev/null
real 0m0.040s
user 0m0.043s
sys 0m0.019s
Who knows what's causing this. However, other programs run fast, such as:
$ hyperfine --warmup 50 'git diff | cat > /dev/null'
Benchmark 1: git diff | cat > /dev/null
Time (mean Β± Ο): 14.4 ms Β± 2.6 ms [User: 8.8 ms, System: 10.7 ms]
Range (min β¦ max): 7.2 ms β¦ 23.5 ms 179 runs
I'll try to create a minimal C++ or Rust program for diff processing and see if it runs fast, I bet it will.
Yes, I can try removing those lines in Rust, I'll do it later.
I haven't seen a difference with --no-gitconfig
Hm. A testament to libgit2's good engineering I guess.
hyperfine is still slow
Could this be your shell startup time? I'd try -N.
Yes, I can try removing those lines in Rust, I'll do it later.
Thanks! I am curious about that.
I'll try to create a minimal C++ or Rust program for diff processing and see if it runs fast, I bet it will.
If you're sure it's worth it! But I'd be inclined not to spend time on that if you don't think it will help solve the problem here.
Automatic dark/light detection also contributes to startup times and depends a lot on the terminal emulator's speed in responding to queries. You can disable dark/light detection by passing either --dark or --light to delta.
Now I am on a desktop with Windows, which is a lot faster than my laptop. Still, unfortunately delta is too slow.
The best way to know what the ideal speed of delta should be is to have a reference implementation. Here it is: https://gist.github.com/certik/8e2270033a0bedbc2daca9b0e5ffd375
When you compile it as documented at the top of the file, let's do some benchmarking, I ran each several times and took the fastest run, in WSL Ubuntu, in a Terminal. First a single line diff:
$ time git diff > /dev/null
real 0m0.006s
user 0m0.001s
sys 0m0.008s
$ time git diff | ./mydelta > /dev/null
real 0m0.008s
user 0m0.003s
sys 0m0.007s
$ time git diff | cat > /dev/null
real 0m0.008s
user 0m0.010s
sys 0m0.001s
$ time git diff | delta > /dev/null
real 0m0.015s
user 0m0.018s
sys 0m0.001s
So mydelta is as fast as cat, about 1-2ms. That's good, that's what I would expect and hope. delta is about 7ms.
Now let's try a larger diff:
$ time git diff > /dev/null
real 0m0.027s
user 0m0.016s
sys 0m0.008s
$ time git diff | ./mydelta > /dev/null
real 0m0.029s
user 0m0.027s
sys 0m0.010s
$ time git diff | cat > /dev/null
real 0m0.028s
user 0m0.017s
sys 0m0.015s
$ time git diff | delta > /dev/null
real 0m0.151s
user 0m0.140s
sys 0m0.043s
Here mydelta is still about 2ms, while delta is about 124ms.
Hyperfine seems to mirror the above timings:
$ hyperfine --warmup 50 'git diff > /dev/null'
Benchmark 1: git diff > /dev/null
Time (mean Β± Ο): 36.6 ms Β± 10.5 ms [User: 26.0 ms, System: 11.2 ms]
Range (min β¦ max): 19.5 ms β¦ 62.0 ms 138 runs
$ hyperfine --warmup 50 'git diff | ./mydelta > /dev/null'
Benchmark 1: git diff | ./mydelta > /dev/null
Time (mean Β± Ο): 36.6 ms Β± 10.0 ms [User: 34.0 ms, System: 11.0 ms]
Range (min β¦ max): 20.8 ms β¦ 59.2 ms 67 runs
$ hyperfine --warmup 50 'git diff | cat > /dev/null'
Benchmark 1: git diff | cat > /dev/null
Time (mean Β± Ο): 38.9 ms Β± 10.3 ms [User: 30.0 ms, System: 12.3 ms]
Range (min β¦ max): 20.4 ms β¦ 58.3 ms 63 runs
$ hyperfine --warmup 50 'git diff | delta > /dev/null'
Benchmark 1: git diff | delta > /dev/null
Time (mean Β± Ο): 178.8 ms Β± 10.8 ms [User: 193.1 ms, System: 41.3 ms]
Range (min β¦ max): 160.3 ms β¦ 207.5 ms 17 runs
Finally, I also tried VSCode, and I get similar timings, that probably is not a surprise since we use /dev/null.
Now let's try an empty diff in a Terminal:
$ time git diff
real 0m0.012s
user 0m0.001s
sys 0m0.012s
$ time git diff | ./mydelta
real 0m0.013s
user 0m0.011s
sys 0m0.009s
$ time git diff | delta
real 0m0.023s
user 0m0.019s
sys 0m0.005s
and VSCode:
$ time git diff
real 0m0.017s
user 0m0.022s
sys 0m0.000s
$ time git diff | ./mydelta
real 0m0.016s
user 0m0.018s
sys 0m0.001s
$ time git diff | delta
real 0m0.067s
user 0m0.022s
sys 0m0.006s
Here delta is just slower even in a Terminal, and it is too slow in VSCode, for whatever reason. mydelta doesn't have any noticeable overhead, since the timings are a bit noisy.
All of the above is reproducible on my machine, I ran it many times.
From this, we can draw some conclusions:
- It is possible to run a 3rd party program (
mydelta) that has less than 2ms overhead even on large diffs, it prints colors and it works in a Terminal and VSCode. It is fast even when showing the diff to the terminal, both Terminal and VSCode. - The speed of
mydeltais on the level ofcat. - Both manual timing and hyperfine show similar results
deltais consistently slower, and in VSCode for any diff and Terminal for larger diffs it is slower than 30ms, thus showing noticeable lag
What is causing it? I don't know. Let me ask some questions:
- Is
deltareading any files? For fastest performance it should have a mode that just reformats the diff, no reading of any files - Is it querying the terminal or system for some capabilities? I would turn it off
- Delta queries the terminal for capabilities at startup; see @bash's post above. To disable it use
dark = trueorlight = true. - Delta reads config files at startup, to disable is
--no-gitconfig - Delta queries for calling processes in a child thread at startup, and then waits for the result later in certain code paths. The only way to disable this currently is to disable the code linked above: https://github.com/dandavison/delta/blob/main/src/utils/process.rs#L67-L78
I think that's all the I/O / potentially expensive syscalls done at startup -- @th1000s / @bash did I miss anything?
I came here trying to figure out why any git action is taking like 1 second... this was why:
time delta
real 0m1.050s
user 0m0.029s
sys 0m0.027s
vs
time delta --dark
real 0m0.067s
user 0m0.038s
sys 0m0.044s
@jb55 what platform is that on?
nixos
Thanks, and what's your terminal emulator (and delta --version)?
delta 0.18.2 rxvt-unicode (urxvt) v9.31
I came here trying to figure out why any git action is taking like 1 second... this was why:
The 1 second difference suggests to me that you're probably running into the timeout for dark/light detection. What terminal emulator are you running this on?
With #1910 it is now possible to measure which component exactly is slow (a process opt-out is also needed). But it seems terminals which are slow to respond are the main culprit (on all my system it is plenty fast however). Once delta runs into that more than once the user could be notified, or maybe use a globally cached value.
Some example output:
$ git show
delta timings (ms after start): tty setup: 2.3 ms, read configs: 6.0 ms, query processes: 26.1 ms, first paint: 10.1
$ git log -p
delta timings (ms after start): tty setup: 3.7 ms, read configs: 7.9 ms, query processes: 23.2 ms, first paint: 11.2
delta timings (ms after start): tty setup: 2.9 ms, read configs: 8.0 ms, query processes: 639.8 ms, first paint: 11.8
# ^ parent process not requested until much later, this value is not when the query finishes
$ git blame
delta timings (ms after start): tty setup: 2.9 ms, read configs: 8.3 ms, query processes: 13.2 ms, first paint: 12.6
I'm using urxvtd (daemon) with urxvtc. I wonder if that has anything to do with it? I will try #1910
delta timings (ms after start): tty setup: 1002.7 ms, read configs: 1009.4 ms, query processes: 0.0 ms, first paint: 0.0
looks like this a terminal_colorsaurus issue?
bumping colorsaurus to 0.4.5 didn't seem to change anything:
delta timings (ms after start): tty setup: 1005.8 ms, read configs: 1015.7 ms, query processes: 0.0 ms, first paint: 0.0
rxvt-unicode (urxvt) v9.31
Aha :) terminal-colorsaurus has a quirk for urxvt because the currently released version doesn't properly terminate responses (http://cvs.schmorp.de/rxvt-unicode/src/command.C?revision=1.600&view=markup).
I use the TERM env var to detect urxvt. Do you overwrite the TERM env var by any chance? If so, running TERM=rxvt-unicode delta should also be considerably faster.
On Tue, Nov 19, 2024 at 10:13:18AM GMT, Tau GΓ€rtli wrote:
rxvt-unicode (urxvt) v9.31 I use the
TERMenv var to detect urxvt. Do you overwrite the TERM env var by any chance? If so, runningTERM=rxvt-unicode deltashould also be considerably faster.
that fixed it. TERM was rxvt for some reason, setting it to rxvt-unicode removed the delay
Awesome! Strange that TERM was rxvt thoughβmaybe I should add that to terminal-colorsaurus too π€
looks like I had:
URxvt*termName: rxvt
in my ~/.Xresources
for whatever reason. removing it defauts TERM to rxvt-unicode-256color
The latest version of terminal-colorsaurus should now works in urxvt regardless of what TERM is set to.
I now had a chance to test vscode's xterm.js - and indeed, it is slow, taking 40ms to respond (often longer), vs. at worst 15ms for konsole connected to the same host via ssh.
Having a lag-free startup is important, so if the tty detection takes longer the result should be cached at ~/.cache/delta/cache-$HOST.env (for ~2 weeks, or a newer delta version) - then print a notice once that a cache was created.
The less --version query -- unlikely to change much, and which can also take 20ms on a cold cache -- can also be moved there.
looks like I had:
URxvt*termName: rxvtin my
~/.Xresourcesfor whatever reason. removing it defauts TERM to
rxvt-unicode-256color
- https://github.com/dandavison/delta/pull/1936
fixed this so I can keep the rxvt termname, which seems to be needed for things like tmux.