haskell-mode
haskell-mode copied to clipboard
Degrading scroll performance in terminal mode
The Problem
Since I switched to emacs I noticed how scrolling sporadically became worse. At times, it would take 3-4s to redraw 40 lines of code.
Apparent relevant factors
After doing a lot of analyzing I narrowed it down to the following combination of things:
- Emacs must run in terminal, in GUI either the problem is not as apparent or its non-existant.
- gc-cons-threshold must be set to a high value. LSP users likely have something like
(setq gc-cons-threshold 100000000)
-
haskell-mode
andfont-lock-mode
must be enabled - Do some text editing
Watch drawing lines become slower. The presence of constructor names seems especially bad. Note, that the entire line is drawn much slower, so if you vertically tile your emacs to have some unrelated buffer on the right hand side, then the entire line with many data constructors on the left hand window will be drawn slowly.
- If you then cause a
(garbage-collect)
, scrolling becomes fast again. test.hs.gz
Reproduce
Locally I used emacs 27.1 installed through the emacs nix overlay:
Click to see/use my nixos module
{ config, pkgs, ... }:
let
emacsOverlay = import (builtins.fetchTarball {
# Descriptive name to make the store path easier to identify
name = "emacs-overlay-2020-09-11";
# Commit hash for nixos-unstable as of 2018-09-12
url = https://github.com/nix-community/emacs-overlay/archive/a69588a3f7de6d68f20cea21562ab7f6f91a400a.tar.gz;
# Hash obtained using `nix-prefetch-url --unpack <url>`
sha256 = "180jp31zfrlhnz490x6hckz6z3bvfxwz1613cg4zmf0lrflrl3gp";
});
myEmacs = pkgs.emacs;
emacsWithPackages = (pkgs.emacsPackagesGen myEmacs).emacsWithPackages;
emacs = emacsWithPackages (epkgs: (with epkgs.melpaStablePackages; [
# melpa stable
ripgrep
magit
# dockerfile-mode
# markdown-mode
# yaml-mode
# json-mode
# go-mode
# nix-mode
use-package
projectile
# direnv
# swiper
# ox-gfm
# pdf-tools
]) ++ (with epkgs.melpaPackages; [
# melpa unstable
solarized-theme
gruvbox-theme
company
counsel
counsel-projectile
ivy
evil
evil-collection
evil-magit
haskell-mode
lsp-mode
lsp-ui
lsp-haskell
lsp-treemacs
treemacs
flycheck
]) ++ (with epkgs.elpaPackages; [
# elpa packages
undo-tree
]) ++ [
# main packages
]);
in
{
nixpkgs.overlays = [emacsOverlay];
services.emacs.enable = true;
services.emacs.install = true;
services.emacs.defaultEditor = true;
services.emacs.package = emacs;
}
- Download the attached test file and edit with
emacsclient -nw
- Ensure you do not have
undo-tree-mode
enabled. This minor mode causes GCs during the test case, leading to inconsistent behavior. - Set
(setq garbage-collection-messages t)
to have emacs tell you if a GC is run Note: This is just for your own sanity so you do know if for some reason there's an unexpected GC - Set
(setq gc-cons-threshold most-positive-fixnum)
Note: We use the highest value to ensure the GC doesn't interfere with our testcase due to some unforseen circumstances. It's an unreasonable value for the real world. - Repeatedly
cut+paste
the block below the definition ofinfo
until the end of the file. Ideally keep a small terminal (say 80x24), this lets you see the redisplay of the definitions ofinfo Attr ...
nicely duringcut+paste
. Repeating this 4-5 times is enough to make performance absolutely unbearable.
Watch the performance degrade with every cut+paste
.
- Run a
(garbage-collect)
and watch performance be fast again.
Notes of interest:
- If the block at the end is wrapped inside a comment, performance stays stable during
cut+paste
- ~~If the block at the end is not font locked (as far as I can tell), performance degrades during
cut+paste
~~ I'm beginning to think there's font-locking on whatever context is there - and the code behind it introduces markers, perhaps per-character or line. - The redisplay degradation is particularly linked to the font locking on data constructors.
If we add more data constructors in those definitions of
info
, then the performance degradation is much worse. - The redisplay performance extends to the entire frame, not just the buffer. Two ways to demonstrate this: If you resize the frame to full screen, watch how every line including toolbar and mode line is drawn slowly too. You can also create horizontal/vertical split windows, and watch how these get redrawn slowly too when the frame is switched between full screen and windowed.
- The problem is reproducible with
undo-tree-mode
present,(setq gc-cons-threshold 100000000)
and normal editing of text. Often writing a 60 line region can cause quite bad performance degradation already, it's just more inconsistent. This I want to point out before someone makes a comment about how repeatedlycut+paste
on 10k lines of code is not normal emacs usage. This is just a synthesized test case with a clear communicatable description how to reproduce.
- The emacs profiler does not help at all.
- During degraded performance, the CPU usage spikes to 100% while scrolling.
Profiler output
If the performance has degraded, running the profiler
while scrolling up and down for a minute gives these CPU and Memory reports
CPU Profiler output
Click me to expand!
Function CPU samples %
- command-execute 3417 99%
- call-interactively 3417 99%
- funcall-interactively 3284 96%
- execute-extended-command 2300 67%
- sit-for 2287 66%
- redisplay 2287 66%
+ redisplay_internal (C function) 1 0%
+ command-execute 12 0%
scroll-down-command 534 15%
- scroll-up-command 450 13%
+ scroll-up 1 0%
+ byte-code 133 3%
+ evil-repeat-post-hook 1 0%
evil--jump-hook 1 0%
+ timer-event-handler 1 0%
+ ... 0 0%
Memory profiler output
Click me to expand!
Function Bytes %
- command-execute 8,755,700 99%
- call-interactively 8,755,700 99%
- byte-code 4,469,695 50%
- read-extended-command 4,469,695 50%
- completing-read 4,469,695 50%
- ivy-completing-read 4,469,695 50%
- ivy-read 4,469,695 50%
- read-from-minibuffer 2,628,499 29%
+ ivy--queue-exhibit 2,068,301 23%
+ command-execute 49,096 0%
+ redisplay_internal (C function) 12,416 0%
+ minibuffer-inactive-mode 7,392 0%
+ #<compiled 0x472e8f9> 1,056 0%
+ undo-auto--add-boundary 1,056 0%
+ undo-auto--undoable-change 72 0%
+ ivy--reset-state 256,404 2%
+ ivy-call 1,056 0%
+ ivy--update-prompt 1,024 0%
- funcall-interactively 4,286,005 48%
- execute-extended-command 4,259,293 48%
- command-execute 3,791,263 43%
- call-interactively 3,791,263 43%
- funcall-interactively 3,791,247 43%
profiler-stop 3,790,592 43%
profiler-start 655 0%
+ sit-for 273,374 3%
+ scroll-down-command 15,960 0%
+ scroll-up-command 10,752 0%
+ redisplay_internal (C function) 23,224 0%
+ jit-lock--antiblink-post-command 8,360 0%
+ timer-event-handler 6,200 0%
evil--jump-handle-buffer-crossing 2,112 0%
+ global-font-lock-mode-check-buffers 1,056 0%
+ internal-echo-keystrokes-prefix 1,056 0%
... 0 0%
GC Output
I first did cut+paste
to degrade performance, then executed two (garbage-collect)
in a row
The first, which fixed performance, gave us the following output
First GC (After degraded performance): Click me to expand!
((conses 16 4979055 645610)
(symbols 48 93078 29)
(strings 32 158480 15678)
(string-bytes 1 8876291)
(vectors 16 92580)
(vector-slots 8 1958997 416166)
(floats 8 336 1527)
(intervals 56 1006222 9317)
(buffers 1000 42))
Second GC, immediately again. This is just to observe whether the first GC has collected huge amounts of memory. Note: It did not. The output is very similar to that of the run before.
First GC (After degraded performance): Click me to expand!
((conses 16 4947118 644997)
(symbols 48 93078 29)
(strings 32 158477 15588)
(string-bytes 1 8875403)
(vectors 16 92579)
(vector-slots 8 1958985 414138)
(floats 8 336 1527)
(intervals 56 1006220 9301)
(buffers 1000 42))
It seems that 75% of the time is spent in encode_coding_object
Attached you can find the perf_events report and flamegraph.
perf_report.gz out.svg.gz out.perf.gz
Samples: 4K of event 'cycles'
# Event count (approx.): 112485425315
#
# Children Self Samples Command Shared Object Symbol
# ........ ........ ............ ............... ..................... ..............................................
#
99.67% 0.00% 0 .emacs-27.1-wra [unknown] [k] 0x41fd89415541f689
|
---0x41fd89415541f689
__libc_start_main
main
:
recursive_edit_1
command_loop
internal_catch
internal_condition_case
command_loop_1
|
|--77.34%--read_key_sequence
| |
| --77.28%--read_char
| |
| --77.22%--redisplay_internal
| |
| |--75.83%--update_frame
| | update_frame_1
| | update_frame_line
| | tty_write_glyphs
| | |
| | --75.81%--encode_terminal_code
| | --75.76%--encode_coding_object
| |
gprof agrees:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
96.23 160.67 160.67 32275 4.98 4.98 encode_coding_object
5ms per call into encode_coding_object makes for absolute horrid performance I guess.
Underlying issue for the test-case appears to be a large accumulation of unreachable markers in the buffer. encode_coding_object
traverses the singly linked of markers at least once, potentially twice.
https://git.savannah.gnu.org/cgit/emacs.git/tree/src/coding.c?h=emacs-27.1&id=86d8d76aa36037184db0b2897c434cdaab1a9ae8#n8281
In the degenerate case this singly linked list can contain millions of elements, causing the slowdown.
What is haskell-font-lock.el
doing to make this problem so apparent, even when editing smaller code regions?
Hi. i was asked by the issue author to help with confirming/reproducing the issue.
i'd like to report back that, after following the explicit steps and conditions here i can reproduce the issue on my machine to degrade scroll performance to the point of being sluggish in terminal mode (emacs -nw) (did not test gui)
it seems to get a lot better / came back to normal after i ran (garbage-collect)
, which also goes with the expected observation in the issue.
my emacs is compiled from source.
i was also asked to provide the output of C-h v system-configuration-options
, which is as follows
system-configuration-options is a variable defined in ‘C source code’.
Its value is
"--with-native-compilation --with-json --with-pgtk --with-modules --with-mailutils --without-pop --without-xwidgets"
String containing the configuration options Emacs was built with.
hope it helps!
Edit:- My exact emacs version would be:-
This is GNU Emacs 28.0.50 (build 3, x86_64-pc-linux-gnu, GTK+ Version 3.24.29, cairo version 1.17.4)
of 2021-06-04
Copyright (C) 2021 Free Software Foundation, Inc.
~~I suspect this to be the culprit:~~
https://github.com/haskell/haskell-mode/blob/b35e8547b67f3d2c4d38ec7e7e516e3c0925352f/haskell-lexeme.el#L86-L131
~~Is it possible that the frequent call of this during font locking generates a lot of match data with markers, left for GC? If that's the case, why don't I notice the scrolling problem in GUI mode?~~
It seems that in GUI mode there's never a call into encode_coding_object
.
So to summarize: It seems that we generate a lot of unreachable markers in haskell-lexeme.el
with ~~looking-at~~ haskell-lexeme-looking-at-qidsym
. These markers cause a slow-down in encoding_coding_object
which traverses the list of markers once or twice. That function is then called on redisplay on terminal mode, causing a delay on drawing each line. However, the function is never called in GUI mode.
A garbage collect would instantly bring the performance back to normal, but especially lsp-users are likely to want a high gc-cons-threshold
. Also, generating this much markers causes a noticeable increase in GC delay.
Normally this wouldn't bother me as much, but I regularly trip into this the problem while doing very normal code editing - granted the scrolling speed does not get as bad as in the text-case after 4-5 cut+paste
, but it's noticably sluggish nevertheless.
Assuming my hypothesis is correct:
As a user workaround I can see
- Reducing gc-cons-threshold
- Using something like
gcmh
- Switching to GUI mode
I can reproduce the slow-down, and the first thing to try would indeed be to cut down on the generation of markers – haskell-lexeme.el
generates way more markers than necessary. haskell-lexeme-looking-at-token
may be the worst of the lot.
Note that the match data is not stored as markers internally in Emacs, only as positions (integers) and an optional reference to the buffer to which they apply (unless the last match was against a string). This means that (match-data)
will reify each position as a marker if the last match was against a buffer, which is quite a waste if done frequently. Using a non-nil INTEGERS argument eliminates that (at the cost of getting the buffer as an extra element at the end, but conses are much cheaper).
It also means that set-match-data
can safely be called with a list of integers – there should be no need to create markers just to set the match data.
More fundamentally, I'm not sure if this obsessive use of mutating the match data everywhere to return information really is the best design. I'd just return a list like (TOKEN START END ...)
etc, and use pcase
to pattern-match on it. It would probably even be faster.
Thanks @mattiase, that makes sense. I'm probably the most active maintainer currently, but still not very active: I'm unlikely to have time to be able to integrate these suggestions myself, but I'd be happy to help shepherd a PR into the repo.
Two reasons why you shouldn't apply that PR I just wrote: it hasn't been tortured properly (leaving that to you), and by doing so we may never find and fix the performance bug in Emacs that is triggered by having large amounts of markers in a buffer.
For the record, the performance problem has largely been solved in Emacs 28. Since it won't be released for a while, it probably makes sense to do something along the lines of PR #1760 as well.