LSP too slow
When I’m trying to work on this project, deno lsp’s memory usage grows up to 1.5 GB in a short amount of time, and IntelliSense, auto-complete, and pretty much everything is getting way slow.
Version: 1.39.1
In case if it might be useful: There are 2 somehow large generated TypeSript files in the source code (tl/2_types.ts, tl/3_functions.ts).
I did a bit of analysis with this project loaded. Here's what I did and some findings:
- I loaded the project linked to the description above
- I used
topto look at the memory usage, I'm not sure if this is how @roj1512 determined that the LSP was using 1.5Gb or not? - I built and ran HeapTrack to get a memory dump of the LSP
- I looked at the results in the HeapTrack UI
I noticed a few things:
- The heap usage as reported by by HeapTrack is actually not very large. The total memory is large, but not the heap. I'm not sure exactly what to read into this, but I guess non-heap allocations (threads and thread stacks?) are the main memory users.
calls to allocation functions: 56610504 (189695/s)
temporary memory allocations: 18278443 (61249/s)
peak heap memory consumption: 310.37M
peak RSS (including heaptrack overhead): 1.76G
total memory leaked: 234.12M
- There are lots of calls to the Deno Linter in the LSP. I commented out the linter endpoint
linter.lint_with_ast(parsed_source);and the memory usage dropped by about 500mb. Not a silver bullet (and we obviously need the linter), but that seems like a lot for linting even large files - Another place I looked was the embedded V8 in the LSP. I tapped into the HeapStats of V8:
let mut heap_stats: v8::HeapStatistics = Default::default();
runtime.v8_isolate().get_heap_statistics(&mut heap_stats);
let size = format!(
"Mallocted memory: {}, external memory: {}, used heap size: {}",
heap_stats.malloced_memory(),
heap_stats.external_memory(),
heap_stats.used_heap_size()
);
lsp_log!("Memory usage: {}", size);
I used this to print the memory usage whenever a request was sent to the v8.runtime. After a bit of usage, the "used_heap_size" grew to almost 1Gb.
Memory usage: Mallocted memory: 540776, external memory: 3325, used heap size: 970004840. This might align with why the RSS is so large, but the heap isn't. It's very possible that the v8 memory doesn't get calculated in the Rust Heap, as I'm sure V8 has their own very low level memory management.
I'm not sure if any of this is news to the development team, but at least for me it showed an overview of where the memory in the LSP is going. Back of the napkin math shows that just under 2/3 of the memory in this case was going to v8, and almost 1/3 to the linter. The rest likely goes to scaffolding in the LSP and other miscellaneous things.
I'm not sure I can take this any further now, but I'm in the discord channel if anyone has questions or wants to look at this further.
@irbull I used top, too. Perhaps you would see that too if you used the LSP for a longer amount of time?
I wonder if we've tried to use --max-old-space-size= with the embedded V8. One theory is that V8 doesn't really need that much memory, but it will happily use it. Once it acquires memory from the OS, it won't hand it back -- Think of someone acquiring a really large moving box to store their stuff in, but only using a small portion of it. From the outside, it appears that the box is giant, but really it's nice and tidy inside.
If we force V8 to use a smaller memory footprint, then maybe it will. Maybe this has been tried and it's a dead-end. It may also mean that if V8 really does require a large memory footprint for some task, then the LSP will crash and burn.
Has anybody done any work to try and configure how V8 is run within the LSP?
I've spent some more time on this. I created a patched LSP that invokes the V8 Garbage Collector after each tsc request. On a Mac M1, on a fairly small workspace (2K lines of TS code and a small Fresh UI), garbage collection took between 20ms - 150ms. I ran this after results were returned from the LSP to VSCode, with the idea that the GC should finish before the next request comes in.
With this change-set, memory grows much slower. After a fully day of work on my Deno project, Memory grew from 750Mb to 1.6Gb. This still isn't great, but on a normal day I'll usually see 3+ Gb, which is probably hitting the max-old-space setting.
This doesn't appear to be a silver bullet, but if we can limit the time the Garbage Collector runs, and we can run it between requests to the LSP, we may be able to slow the memory growth.
I've also added some V8 Heap Statistics to my patched LSP. In particular, I've been monitoring:
heap_stats.malloced_memory(),
heap_stats.external_memory(),
heap_stats.used_heap_size(),
These seem to stay fairly stable, and show nothing near 1.5Gb. They fluctuate a bit, but always stabilize.
I went in a slightly different direction today and focused back on the Linter. The following snippet shows unbounded memory growth:
use deno_ast::parse_module;
use deno_ast::MediaType;
use deno_ast::ParseParams;
use deno_ast::SourceTextInfo;
fn main() {
let source_text = "class MyClass {}";
let text_info = SourceTextInfo::new(source_text.into());
let parsed_source = parse_module(ParseParams {
specifier: deno_ast::ModuleSpecifier::parse("file:///my_file.ts").unwrap(),
media_type: MediaType::TypeScript,
text_info,
capture_tokens: true,
maybe_syntax: None,
scope_analysis: false,
})
.expect("should parse");
let mut counter = 0;
loop {
let _v: Vec<String> = parsed_source.with_view(|_| vec![]);
counter = (counter + 1) % 1000000;
match counter {
0 => println!("view view called 1000000 times"),
_ => (),
}
}
}
The parsed_source.with_view is what's being called with the linter. I don't think this should show unbounded memory growth.
There was in fact a leak in dprint-swc-ext which @dsherret addressed today [1]. This should land in 1.41.0. I've done some testing with this change-set applied and things seem much better. I still have a patched LSP that is also doing more aggressive GC after each tsc call so that might also help. But let's test this fix once 1.41 lands and see how the performance is.
[1] https://github.com/dprint/dprint-swc-ext/commit/64e8f436b8f27a659fe06ca3b4fe90dd3aaa0554
@roj1512 How has the performance been since 1.41.0? Should this issue stay open?
Yes, it should. It hasn’t improved.
Hi.
I've encountered the same problem when trying to use the library.
Here's a minimal reproduction repo, and below is a screenshot of btop showing the memory usage.
If I remove the second parameter from the function at line 10, the slow down is way less dramatic, but still noticeable. However, if I add a second parameter, it becomes really slow, to the point where using it is unfeasible. Auto-complete takes a long time, and diagnostics take even more.
OS information:
OS: EndeavourOS Linux x86_64 Host: ASUSTeK COMPUTER INC. TUF GAMING B550M-PLUS WIFI II Kernel: 6.8.7-arch1-1 Packages: 1365 (pacman), 8 (flatpak) DE: GNOME 46.0 WM: Mutter CPU: AMD Ryzen 5 3600X (12) @ 3.800GHz GPU: NVIDIA GeForce RTX 4070 Memory: 6715MiB / 31980MiB
@roziscoding
Thanks for the reproduction, that was really helpful – I also see the behavior you describe. I investigated this, and it looks like the source of the slow completion and diagnostics is the typescript compiler. For context, our LSP uses the typescript compiler to provide features like completion. Unfortunately there's not much we can do on our end to speed that up.
In particular, it looks like tsc is struggling with figuring out which overload of the on function applies.
As a workaround, you can erase the type so that tsc doesn't try to infer, like
// by asserting that `client` is `any`, tsc won't try to infer types on the arguments
(client as any).on("message:text", (ctx, next) => {});
The downside of that, of course, is that you lose completion and types for ctx and next. You can regain some of that by explicitly stating the types of those arguments, like
// I got these argument types from hovering on each argument in the original code
(client as any).on("message:text", (ctx: WithFilter<Context, "message:text">, next: NextFunction<void>) => {});
From what I see, that greatly improves the completion performance.