language-tools
language-tools copied to clipboard
Make semantic tokens more performant
We should add the following logic to semantic tokenization and see if it improves performance:
When a document is updated, check if was a added/deleted alphanumerical character and if the previous character is also alphanumerical. In that case the resulting word will be treated the same semantic-wise. Example: const fo -> const foo -> we know that fo/foo are the same semantic type. Therefore skip computation and mapping entirely and reuse the previous tokens and only adjust/update that result.
I tested this out, turns out this doesn't help us at all unfortunately. VS Code's highlighting works on word boundaries already so if we type out something it will keep the same color. The proposed algorithm also has a fundamental flaw: When a variable name changes, its semantic meaning could change as well. fo could be a const but foo could be a function, which should result in different coloring which would no longer be the case when applying the outlined algorithm.
So what could be other ways to make this faster?
- somehow only let TS encode semantic tokens on the text change range. This would be faster since this would mean only encoding one variable most of the time. The danger is that we need to find out if the changed variable in question was at the declaration site, not usage site, and so maybe its semantic meaning could have changed (someone did
const foobut now decided to dolet foo). How to find out? We probably need to check the TS AST for that ("is modifying declaration site? -> play it safe and don't do optimzations"). - turn the logic around and adjust mappings after changes where we know that they can't have changed a semantic token - basically non-word characters.
~~What makes TypeScript fast at semantic tokens and this extension slow? Is it due to svelte2tsx transformations or something like that?~~
~~Presuming it is due to svelte2tsx, my naive first idea is to simply extract <script> content and parse for semantic tokens separately there. Everywhere else, the current approach is used. This could probably be sped-up using a worker, if that's possible. My instincts tell me the issue with this is that certain transformations need to be done in order for the semantic highlighting to be correct in <script> tags, but what are those transformations? I would imagine semantic highlighting wouldn't need "valid" code.~~
~~Although, if it was workerized, I would actually put the more complicated svelte2tsx semantic highlighting in the worker.~~
EDIT: Scratch all of that, I actually started to play with the semantic tokens and realized that in:
https://github.com/sveltejs/language-tools/blob/50c13f6ee6910beb75d19628e726dc464c1e9057/packages/language-server/src/plugins/PluginHost.ts#L406-L419
the execution priority is "low". This makes the semantic tokens insanely unresponsive and I thought this was what the performance issue was. The semantic tokens seem equivalent (or close enough) to TypeScript's performance when I set the priority to "high", and honestly I don't see why it wouldn't be on "high". Do the semantic token requests saturate the queue otherwise?
I mean, even if they did, I would think that semantic tokens would be the highest priority because they're related to syntax highlighting, and so the user expects that first over anything else.
EDIT 2: Yeah, and I did some basic profiling of the getSemanticTokens function:
getSemanticTokens (getLSAndTSDoc) 5.553936004638672
getSemanticTokens (convertToTextSpan) 0.013846993446350098
getSemanticTokens (getEncodedSemanticClassifications) 48.69432598352432
getSemanticTokens (mapToOrigin) 1.1914759874343872
getSemanticTokens (sort) 0.036069005727767944
getSemanticTokens (build) 0.03681999444961548
getSemanticTokens (getLSAndTSDoc) 4.4516920149326324
getSemanticTokens (convertToTextSpan) 0.005770981311798096
getSemanticTokens (getEncodedSemanticClassifications) 42.874532014131546
getSemanticTokens (mapToOrigin) 1.0215599834918976
getSemanticTokens (sort) 0.04444500803947449
getSemanticTokens (build) 0.048002004623413086
getSemanticTokens (getLSAndTSDoc) 7.603240996599197
getSemanticTokens (convertToTextSpan) 0.011411994695663452
getSemanticTokens (getEncodedSemanticClassifications) 59.634951025247574
getSemanticTokens (mapToOrigin) 1.1592940092086792
getSemanticTokens (sort) 0.03454598784446716
getSemanticTokens (build) 0.03486701846122742
The stuff the language server does is reasonably fast, at least in comparison to TypeScript's semantic token parsing which dominates the time taken. If that's good enough for TypeScript, I think it would be good enough for Svelte.
The priority queue was added in #1144 as a way to give more important computations like getCompletions priority. At that time I had the impression that semantic tokens was holding up the rest of the computations. But I did some more profiling recently, too and found that most of the time semantic tokens are not that much of a bottleneck as I thought. I still would need to test this out on some bigger projects first - for example in https://github.com/sveltejs/language-tools/issues/1139#issuecomment-904526552 it looks like semantic tokens is the bottleneck (but maybe that log is flawed since it doesn't take into account the other stuff that has run before returning stuff; or the TS got more performant in the meantime). Maybe we should also add a third option, mid, which only waits 200ms.
If a delay is needed, I would make it so that the semantic tokens don't get indefinitely delayed if you keep typing. As in, throttle semantic token requests rather than debounce them. That should make them feel a lot more responsive despite having a delay.
typescript-ls now starts a separate server for semantic requests: https://github.com/typescript-language-server/typescript-language-server/pull/688
vue-ls also appears to use separate semantic and syntactic servers.
While it is a general performance enhancement tactic, I don't think this is related. The main reason we delay the semantic token is that we want to prioritize auto-completion, which is also a semantic request.
The "semantic" in the "semantic server" does not mean dedicated to semantic tokens but all the semantic language features. I.e., Something that needs the context of the types, variable references, scoping etc. While features only related to syntax can be calculated in a separate process.