vimtex
vimtex copied to clipboard
about slow syntax highlighting
Description
vimtex's syntax highlighting is a bit slow at times. It's not terrible but if I open a large tex file and scroll up and down with my touchpad it is noticably not smooth. I've tried to look at the output of :syntime report
and see if there's anything that can be improved. Here's the output
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
0.222296 81139 74810 0.000459 0.000003 texMathDelim [()[\]]\|\\[{}]
0.146481 21673 49 0.000507 0.000007 texLigature \v%(``|''|,,)
0.105568 25166 79 0.000556 0.000004 texSpecialChar \%(\\\@<!\)\@<=\~
0.090826 21627 0 0.000207 0.000004 texMathZoneLI \%(\\\@<!\)\@<=\\(
0.084873 70591 61790 0.000521 0.000001 texMathSuperSub [_^]
0.060983 74427 498 0.000408 0.000001 texMathZoneTI \\\\\|\\\$
0.060684 38720 30212 0.000448 0.000002 texMathOper [-+=/<>|]
0.039987 51544 45011 0.000128 0.000001 texMathCmd \\\a\+
0.022891 26430 6532 0.000241 0.000001 texComment %.*$
0.022842 25192 16306 0.000065 0.000001 texMathDelimMod \\\(left\|right\)\>
0.021043 23855 4193 0.000384 0.000001 texMathGroup \\\\\|\\}
0.020043 32248 17368 0.000079 0.000001 texCmd \\[a-zA-Z@]\+
0.020011 5365 209 0.000412 0.000004 texCommentAcronym \v<(\u|\d){3,}s?>
0.019337 5194 3 0.000303 0.000004 texCommentURL \w\+:\/\/[^[:space:]]\+
0.018543 21627 0 0.000066 0.000001 texCmdConditionalINC \\\w*@ifnextchar\>
0.016475 21627 0 0.000101 0.000001 texCmdLigature \v\\%([ijolL]|ae|oe|ss|AA|AE|OE)\ze[^a-zA-Z@]
0.016086 21627 0 0.000052 0.000001 texSynIgnoreZone ^\c\s*% VimTeX: SynIgnore\%( on\| enable\)\?\s*$
0.014915 16338 3614 0.000067 0.000001 texMathArg \\\\\|\\}
0.014743 21627 0 0.000047 0.000001 texCmdSpaceCode \v\\%(math|cat|del|lc|sf|uc)code`
0.014660 21627 38 0.000082 0.000001 texMathZoneEnv \\begin{\z(cd\*\?\)}
0.014640 14477 12771 0.000071 0.000001 texMathTextAfter \w\+
0.014227 22104 563 0.000064 0.000001 texCmdCRef \v\\%(%(label)?c%(page)?|C)ref>
0.014106 25136 0 0.000119 0.000001 texCmdRef \\\(page\|eq\)ref\>
0.014085 23605 6905 0.000066 0.000001 texCmdEnv \v\\%(begin|end)>
0.013869 21627 0 0.000103 0.000001 texCmdLigature \v\\%([ijolL]|ae|oe|ss|AA|AE|OE)$
0.013798 25136 0 0.000148 0.000001 texComment ^\s*\\iffalse\>
0.013527 25136 0 0.000438 0.000001 texCmdRef \\v\?ref\>
0.013382 25136 0 0.000124 0.000001 texComment ^\s*%\s*!.*
0.012572 21627 0 0.000061 0.000001 texCmdPart \\\(front\|main\|back\)matter\>
0.012235 25172 48 0.000079 0.000000 texSpecialChar \\[,;:!>]
0.011866 21654 103 0.000117 0.000001 texCmdConditional \\\(if[a-zA-Z@]\+\|fi\|else\)\>
0.011844 21627 0 0.000047 0.000001 texConditionalTrueZone ^\s*\\iftrue\>
First of all, I think the very slow \v%(``|''|,,)
can be replaced by the equivalent \([`',]\)\1
, which was slightly faster for me, averaging 4us instead of 7us.
I was very confused by \%(\\\@<!\)\@<=\~
. Am I correct in understanding that
- it is equivalent to
\\\@<!\~
- the point of making it more complicated is that it will match faster (with a naive regex engine): After finding a
~
, it will only try to check if there's a backslash before the~
once, instead of trying to match every substring ending before~
against the regex\\
?
If so then the same behavior could be achieved with\\\@1<!\~
which looks simpler. Unfortunately it doesn't seem to give a speedup.
Another point is that this regex will parse something like a\\~b
wrongly. This is more relevant for parsing something like \\\(a^2\)
-- this is valid latex but vimtex's highlighting currently doesn't recognize the math mode (OTOH I don't know why anyone would ever write that). The regex \%(\\\@!\%(\\\\\)*)\@<=\\(
would fix this, checking if there's an even number of backslashes before the \(
. Same goes for detecting ~
. The performance of this seems to be slightly worse than \%(\\\@!)\@<=\\(
though. I got 11us vs 9us.
Do you use a latexmkrc file?
No
VimtexInfo
System info:
OS: Ubuntu 23.10
Vim version: NVIM v0.10.0-dev-2175+g85a041716
Has clientserver: true
Servername: /run/user/1000/nvim.242708.0
VimTeX project: m
base: m.tex
root: /home/ca/vim
tex: /home/ca/vim/m.tex
main parser: current file verified
document class: article
packages: accents aliascnt aliasctr amsbsy amsfonts amsgen amsmath amsopn amssymb amstext amsthm atbegshi atbegshi-ltx atveryend atveryend-ltx autonum auxhook bigintcalc bitset calc cleveref color csquotes enumitem epstopdf-base etex etextools etoolbox geometry gettitlestring graphics graphicx hycolor hypcap hyperref iftex ifthen ifvtex infwarerr inputenc intcalc keyval kvdefinekeys kvoptions kvsetkeys letltxmacro ltxcmds mathrsfs mathtools mhsetup mleftright nameref parseargs pdfescape pdftexcmds pgf pgfcomp-version-0-65 pgfcomp-version-1-18 pgfcore pgffor pgfkeys pgfmath pgfrcs pgfsys refcount rerunfilecheck rotating textpos tgpagella thm-amsthm thm-autoref thm-kv thm-listof thm-patch thm-restate thmtools tikz tikz-cd todonotes trig uniquecounter url xcolor xkeyval
source files:
m.tex
../texmf/tex/latex/preamble.tex
compiler: latexmk
engine: -pdf
options:
-verbose
-file-line-error
-synctex=1
-interaction=nonstopmode
callback: 1
continuous: 1
executable: latexmk
viewer: Zathura
xwin id: 0
qf method: LaTeX logfile
To test the slow \(\)
more I just replaced all $
-math in my tex file with \(\)
and I got some pretty bad time:
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
0.339383 8439 8439 0.000610 0.000040 texMathZoneLI \%(\\\@<!\)\@<=\\)
0.117477 12558 2847 0.000994 0.000009 texMathZoneLI \%(\\\@<!\)\@<=\\(
I don't understand why finding the closing \)
is so slow but it seems like replacing
execute 'syntax region texMathZoneLI matchgroup=texMathDelimZoneLI'
\ 'start="\%(\\\@<!\)\@<=\\("'
\ 'end="\%(\\\@<!\)\@<=\\)"'
\ 'contains=@texClusterMath'
\ l:conceal
by
execute 'syntax region texMathZoneLI matchgroup=texMathDelimZoneLI'
\ 'start="\%(\\\@<!\)\@<=\\("'
\ 'skip="\\\\"'
\ 'end="\\)"'
\ 'contains=@texClusterMath'
\ l:conceal
in vimtex/autoload/vimtex/syntax/core.vim
makes it much better (and also fixes some wrong highlighting in e.g. \(x^2\\\)
)
vimtex's syntax highlighting is a bit slow at times. It's not terrible but if I open a large tex file and scroll up and down with my touchpad it is noticably not smooth. I've tried to look at the output of
:syntime report
and see if there's anything that can be improved. Here's the output
Thanks for looking into this and for providing some profiling numbers!
First of all, I think the very slow
\v%(
|''|,,)can be replaced by the equivalent
([`',])\1 ``, which was slightly faster for me, averaging 4us instead of 7us.
Could you check the original pattern without the group, i.e.
syntax match texLigature "``\|''\|,,"
I would think it should be faster still, but it would be nice to see how it compares to your current numbers. (I've pushed an update that does this already, because I can't see how it would not be an improvement. But I'm curious if your suggested version may be even faster.)
I was very confused by
\%(\\\@<!\)\@<=\~
.
Not surprising. It's quite complicated; perhaps needlessly so. I have to admit that it does look equivalent to \\\@<!\~
. I'm updating that now.
Am I correct in understanding that …
- the point of making it more complicated is that it will match faster (with a naive regex engine): After finding a
~
, it will only try to check if there's a backslash before the~
once, instead of trying to match every substring ending before~
against the regex\\
?
Did you already check if the original pattern matches faster than the simplified pattern? That's would be surprising to me.
Another point is that this regex will parse something like
a\\~b
wrongly.
I've pushed a simplification of the pattern now, and it seems to work well on a\\~b
.
This is more relevant for parsing something like
\\\(a^2\)
-- this is valid latex but vimtex's highlighting currently doesn't recognize the math mode (OTOH I don't know why anyone would ever write that). The regex\%(\\\@!\%(\\\\\)*)\@<=\\(
would fix this, checking if there's an even number of backslashes before the\(
. Same goes for detecting~
. The performance of this seems to be slightly worse than\%(\\\@!)\@<=\\(
though. I got 11us vs 9us.
I've tested this a little bit further, and I believe that the complexity is not really needed here. \\
is already matched early as a texTabularChar
. I'm therefore pushing a further simplification on this that I believe should also work as expected and improve things somewhat.
To test the slow
\(\)
more I just replaced all$
-math in my tex file with\(\)
and I got some pretty bad time: …
I've simplified this even further. How do the timing look now?
A quick test seems to indicate that \%([`',]\)\1
(average 3.0us) might be faster than ``\|''\|,,
(average 4.7us). But I'm not sure this sample is representative.
Did you already check if the original pattern matches faster than the simplified pattern? That's would be surprising to me.
Some data for this: I created some files with a couple of lines like 999 times i
and then a single ~
(in math mode). This should be a worst-case scenario for lookbehinds. These are the results
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
0.846513 1964 933 0.002901 0.000431 texSpecialChar \%(\\\@<!\)\@<=\~
0.070227 1944 928 0.000472 0.000036 texSpecialChar \\\@1<!\~
0.052229 1529 727 0.000437 0.000034 texSpecialChar \\\@<!\~
1.125284 2296 1072 0.003568 0.000490 texSpecialChar \%(\\\@<!\%(\\\\\)*\)\@<=\~
0.000506 2146 1004 0.000017 0.000000 texSpecialChar \~
So it seems like the change you already pushed is better than the way it was. However the pattern \\\@<!\~
is wrong in situations like a\\~b
so maybe it would be preferable to just match \~
and rely on texTabularChar
matching double backslashes first. This leads to somewhat weird highlighting of strings like \~
, but that's not valid tex in math-mode anyway.
Here's another idea that might improve syntax highlighting performance. Currently there's a lot of syntax definitions that match specific commands. It might be faster to just have a syntax group that matches commands, i.e. \\[a-zA-z@]\+
and then have this syntax group contain all the specific commands, e.g. texCmdAccent
, texCmdLigature
. A quick test with those two syntax groups looks quite promising.
A quick test seems to indicate that
\%([`',]\)\1
(average 3.0us) might be faster than|''|,, `` (average 4.7us). But I'm not sure this sample is representative.
Interesting. I can't understand why it would be faster, but I'll switch based on your evidence.
Did you already check if the original pattern matches faster than the simplified pattern? That's would be surprising to me.
Some data for this: I created some files with a couple of lines like 999 times
i
and then a single~
(in math mode). This should be a worst-case scenario for lookbehinds. These are the resultsTOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN 1.125284 2296 1072 0.003568 0.000490 texSpecialChar \%(\\\@<!\%(\\\\\)*\)\@<=\~ 0.846513 1964 933 0.002901 0.000431 texSpecialChar \%(\\\@<!\)\@<=\~ 0.070227 1944 928 0.000472 0.000036 texSpecialChar \\\@1<!\~ 0.052229 1529 727 0.000437 0.000034 texSpecialChar \\\@<!\~ 0.000506 2146 1004 0.000017 0.000000 texSpecialChar \~
Ok, so the current version is very fast now. That's good. But …
So it seems like the change you already pushed is better than the way it was. However the pattern
\\\@<!\~
is wrong in situations likea\\~b
so maybe it would be preferable to just match\~
and rely ontexTabularChar
matching double backslashes first. This leads to somewhat weird highlighting of strings like\~
, but that's not valid tex in math-mode anyway.
Yes, you are right. I'm sorry for first insisting otherwise. I think using the "trivial" \~
is really fine here, because \\
is properly matched already as texTabularChar
and \~
is matched as texCmdAccent
. In math mode this latter command does not exist and will typically be an error anyway, so why worry about it?
Here's another idea that might improve syntax highlighting performance. Currently there's a lot of syntax definitions that match specific commands. It might be faster to just have a syntax group that matches commands, i.e.
\\[a-zA-z@]\+
and then have this syntax group contain all the specific commands, e.g.texCmdAccent
,texCmdLigature
. A quick test with those two syntax groups looks quite promising.
Yes, you may be right. But it does seem lik a large amount of work to do this. And in my experience, syntax performance is not really a big issue?
I'll close this, but feel free to continue the discussion.
From your original list of slow patterns, it seems we should consider the texMathDelim
pattern. Do you have any ideas on this one?
Also: If you care to share a nice example file with which you are now testing the syntax speed, that would be nice. I'm thinking of adding an example to the test files so that I have a nice way to reproduce timings.
I've added a very tiny example here: https://github.com/lervag/vimtex/commit/2477b879251fa8ec61dd017702a099c6048ea0ef#diff-b6fcc94b4e1e1c06afd70f6fa03d63100d069eed363f356fe23eb30bbe2af033
I modified your script slightly:
set nolazyredraw
let LINES = line('$')
syntime on
for s:x in range(2*LINES/winheight(0))
norm!
redraw!
endfor
and ran it on the thesis.tex
example file included with vimtex. The top syntimes are
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
0.358374 58463 7456 0.000086 0.000006 texLength \<\d\+\([.,]\d\+\)\?\s*\(true\)\?\s*\(bp\|cc\|cm\|dd\|em\|ex\|in\|mm\|pc\|pt\|sp\)\>
0.152289 135299 83807 0.000049 0.000001 texCmd \\[a-zA-Z@]\+
0.066881 53552 0 0.000042 0.000001 texComment ^\s*\\iffalse\>
0.044951 53552 0 0.000043 0.000001 texComment ^\s*%\s*!.*
0.043604 82835 35604 0.000042 0.000001 texOptSep ,\s*
0.038223 180644 31556 0.000038 0.000000 texOpt \]
0.038166 58993 3851 0.000034 0.000001 texArg \\\\\|\\}
This is interesting, because for one of my files I get
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
0.017515 11303 10559 0.000023 0.000002 texMathDelim [()[\]]
0.010766 11929 100 0.000008 0.000001 texMathZoneTI \\\\\|\\\$
0.010017 10471 9433 0.000030 0.000001 texMathSuperSub [_^]
0.006673 9823 9159 0.000013 0.000001 texMathCmd \\\a\+
0.006661 5076 4084 0.000018 0.000001 texMathOper [-+=/<>|]
0.003620 3829 516 0.000006 0.000001 texMathGroup \\\\\|\\}
0.002590 2689 0 0.000019 0.000001 texCmdConditionalINC \\\w*@ifnextchar\>
0.002480 4763 2925 0.000009 0.000001 texCmd \\[a-zA-Z@]\+
0.002314 2457 1411 0.000012 0.000001 texMathDelimMod \\\(left\|right\)\>
So which particular rules take long might vary from case to case. Anyway I also plotted the results (form thesis.tex
)
and I think looking at the top syntimes might be barking up the wrong tree. It seems like the large number of (fast) syntax rules is a bigger issue than some individual slow ones.
I modified your script slightly:
set nolazyredraw let LINES = line('$') syntime on for s:x in range(2*LINES/winheight(0)) norm! � redraw! endfor
So, the idea here is to scroll through a file, right? So it's norm! <c-f>
or something?
and ran it on the
thesis.tex
example file included with vimtex. The top syntimes are …This is interesting, because for one of my files I get …
So which particular rules take long might vary from case to case. Anyway I also plotted the results … and I think looking at the top syntimes might be barking up the wrong tree. It seems like the large number of (fast) syntax rules is a bigger issue than some individual slow ones.
The thesis.tex
file is not really a very good example of a common LaTeX project. First, it does not contain very much math. Next, the content is repeated several times to increase the length of the file so that it becomes much bigger than most projects.
Thus, it is not so strange that there are big differences in which rules take long.
Further, the main things we want is for a single screen render to be quick. For this, we want to have low average (and slowest) times for all rules. We don't want slow rules, or at least we want them to be very rare.
I found some time to spend on this today. I ended up using the source of this paper https://arxiv.org/abs/1512.07213 to time things with that scrolling script (the non-printable character is ^D
). It seems like I was able to get a 20% speedup by trying to reduce the number of syntax rules created. I've put my changes in the faster-syntax branch on my fork.
Most of it is just "merging" regular expressions, although I also tried changing the vimtex#syntax#core#new_env
function so that it only creates one (big) syntax rule for texMathEnvBgnEnd
, texMathZoneEng
and texMathError
(so every time the function is called I delete the old syntax rule and replace it). For this I had to limit what you can do vimtex#syntax#core#new_env
when {'math': v:true}
, in particular you can't pass the __predicate
argument. I'm not exactly sure what the use case of that is. The new function just throws errors whenever the combination of arguments could cause trouble. I think this maybe doesn't limit functionality too much.
Looking at the code probably makes things more clear than I can explain here.
Interesting. With your branch I do get a very noticeable speedup on my example:
Now, I notice you do a lot of different stuff, e.g. changing to the old regex engine. It's a little bit hard to read which of your changes are the most significant. But I'm beginning to think that one of the most significant factors is the number of rules. Thus, as you say, reducing the number of rules by using more complex regexes seems to be a useful trick.
Could you explain the timings you've added in your commits? E.g.
Are the numbers the current runtime? If so, it seems to be increasing with the commits and the latest one is the slowest. Clearly, that's not the correct understanding, but perhaps you see my confusion?
Now, it looks like you've done a very good and thorough job here. I believe it may be a good idea to add a comment to the top of the core.vim file that summarizes some of the key reflections here?
Also, I am wondering if you are proposing that I merge this or if you want to open a PR with your work more cleaned up?
Could you explain the timings you've added in your commits? E.g.
They are just the runtimes of test.vim
(using that arxiv paper I linked as main.tex
) on my computer (while fixing cpu frequency). They are not very meaningful by themselves, I was just adding them to keep track of how much mileage I was getting out of every commit.
Ok, thanks for clarifying. How about my other questions?