kotlin-language-server icon indicating copy to clipboard operation
kotlin-language-server copied to clipboard

Cache dependencies

Open daplf opened this issue 2 years ago ā€¢ 4 comments

Related to https://github.com/fwcd/kotlin-language-server/issues/328

Introduces a mechanism to cache the project's dependencies.

This improves startup performance a bit, since we no longer need to run maven/gradle tasks every time we open a project.

Every time we open a project, we now check to see if the build file has changed since the last time it was opened (i.e., the last time we updated the cache). If no changes were done, we use the cache. Otherwise, we just fetch the dependencies again. We can obviously make this more robust, but that can come later.

The caching mechanism can later be used for other things in the server as well (such as caching the IR and the symbol index).

The caching is optional at the moment. It requires sending a storagePath in the initializationOptions on the initialize request, like so:

{
   ...
   "initializationOptions": {
      "storagePath": "/some/path"
   }
   ...
}

I created a PR to update the vscode extension so it sends this as well: https://github.com/fwcd/vscode-kotlin/pull/86

I used kotlinx.serialization to serialize the data. I used JSON since it's the only stable format currently supported (all the others are experimental). JSON isn't ideal, so we should change this at some point, but for dependencies it's certainly fine (it's not much data).

This is the first step towards improving startup performance. In the future, we should also cache other things as mentioned above. The symbol index is especially relevant, since it takes quite a while to build.

daplf avatar Mar 17 '22 00:03 daplf

@fwcd conflicts fixed. For some reason, one of the CI builds failed, but I can't reproduce it locally (tests are green).

Writing https://github.com/fwcd/kotlin-language-server/issues/352 got me thinking, perhaps we could use a single unified on-disk (per-project) database (e.g. an SQLite DB) to hold both cached dependencies and the entire symbol index, instead of serialized JSON documents? This would both reduce the memory footprint of the language server and startup time in existing projects (we would just need to make sure to reindex whenever the index isn't up to date).

This is actually what I had in mind for the symbol index. I was thinking of keeping H2 since it can be stored on disk as well, but any DB will do (and I don't have any strong feelings against any of them tbh). Regarding the other things, I didn't consider using a DB for them, but I suppose it would work as well. I think I went with this approach because it seemed simple enough, but we can consider using a DB. I also thought about caching the symbols of certain dependencies globally (like the kotlin stdlib), since they are used in all projects.

Another aspect that we should probably look into is caching the source files' IR. Right now, when we start the server, the entire project is compiled, which can take a while since invoking the compiler on all files is not that fast. We could store the IR of the files (i.e. the SourcePath.files) to avoid this.

daplf avatar Apr 23 '22 21:04 daplf

@fwcd Btw, we should also assume clients won't always give us a location to cache the data (i.e., not all clients will send the storagePath). In these cases, we need things to be in memory still.

daplf avatar Apr 23 '22 21:04 daplf

I didn't know that H2 supported on-disk databases, but it does, which is great, so that seems like the easiest path forward. Although I think the current JSON implementation is fine, the reason I am a bit hesitant with merging this PR is that if we plan to move to a database anyway it would probably be easier to 'get it right' the first time, especially since we are dealing with persisted data in the user's workspace, which is a bit tricky to evolve once the language server ships. What do you think?

Another thought I had was that we might want to have some kind of 'cleanup' mechanism. If the on-disk indices get large (hundreds of megabytes or even gigabytes) we should probably have a request or command to clear these caches (perhaps even do so automatically once they exceed a certain size/age threshold, though that might be something the VSCode extension could deal with, since it manages the extension's storagePath)

fwcd avatar Apr 24 '22 21:04 fwcd

I didn't know that H2 supported on-disk databases, but it does, which is great, so that seems like the easiest path forward. Although I think the current JSON implementation is fine, the reason I am a bit hesitant with merging this PR is that if we plan to move to a database anyway it would probably be easier to 'get it right' the first time, especially since we are dealing with persisted data in the user's workspace, which is a bit tricky to evolve once the language server ships. What do you think?

I think it makes sense to keep this open for now and I can think about a DB based implementation when I have some time. It's probably not that hard to refactor this PR to use a DB and, as you said, it's better to just get it right the first time instead of changing things later.

Another thought I had was that we might want to have some kind of 'cleanup' mechanism. If the on-disk indices get large (hundreds of megabytes or even gigabytes) we should probably have a request or command to clear these caches (perhaps even do so automatically once they exceed a certain size/age threshold, though that might be something the VSCode extension could deal with, since it manages the extension's storagePath)

Makes sense. I'm not sure whether this should be on the client or server side though. Technically, the client owns the storagePath but it has no way of knowing what is actually stored there (and how). It might be tricky to partially delete things, for example. The server has more knowledge over what things can be deleted, which makes it easier to do automatic cleanups for example (assuming we want to do smart cleanups instead of deleting everything). For a full cleanup, a command would also be cool (and it's what vscode-java offers as well).

daplf avatar Apr 24 '22 23:04 daplf

Hey @fwcd It's been a while šŸ˜…

I finally got back into this and refactored the PR to cache dependencies using a DB.

When a storage path is provided, we now setup a global SQLite DB for the entire server. The DB is passed to the cached class path resolver and the symbol index. Both make use of it to store their stuff. For now, we are only caching dependencies, but the ground work is done for the symbol index to be cached as well (in a future PR).

When a storage path is not provided, the symbol index just creates its own in-memory H2 DB as it did before.

Let me know what you think. I'm hoping this improves things.

daplf avatar Apr 09 '23 22:04 daplf

I'm also very keen for this as my startup time for the server is 20+ seconds :)

RenFraser avatar May 01 '23 20:05 RenFraser

One minor suggestion: If we move closer to merging this, could we document the usage somewhere? Just that we can take in a storagePath option in the initializationOptions, and that it expects to be a directory. (I had a brainfart where I tried to use a path to a sqllite file directly). Just think having it mentioned somewhere (readme, other doc etc.) would make it easier for people implementing the support for this new feature into various clients šŸ™‚

Thanks for the suggestion. Updated the README with a reference to the initialization options, similar to what we have for the protocol extensions.

Did not get the memory usage improvement I was expecting though, but that might be some issues local to my machine or setup. (just simple monitoring with VisualVM). Experienced less memory usage with the ondisk cache, just not the extreme improvement my mind expected.... Maybe the win is bigger for even bigger projects?

Tbh, I didn't really look into the impact on memory usage of this, since my focus was caching (especially preparing a foundation for the symbol index to be cached as well, which could be pretty helpful IMO). Memory usage is something we can (and should) definitely look at, but I personally find it out of scope for this PR. But thanks for the feedback!

daplf avatar May 03 '23 02:05 daplf

Sorry, I just want to express my hopes for this PR to get merged. Looking forward to see the improvements. The Kotlin language server is the by far most slow one among all Iā€™m using. Would be great to have this resolved. šŸ˜Š

weilbith avatar Jun 03 '23 12:06 weilbith

Hey @fwcd Any chance you can have another look at this soon? Thanks šŸ™‚

daplf avatar Jun 07 '23 13:06 daplf

bump @fwcd šŸ™‚ Any chance to have a second look here? Think it is best that you do a second look before merging. Think trying this out can prove quite useful. Might be our way to make the language server more effective, less memory usage, faster startup times etc. šŸ™‚

themkat avatar Jun 21 '23 15:06 themkat

Hey again @fwcd

Any chance we can move this forward? :)

daplf avatar Aug 09 '23 19:08 daplf

Sorry for the long delay here. Given that there's a lot of interest in this PR, I would say that we merge it, so everyone can play with it, and then iron out any smaller bugs to the next release.

fwcd avatar Aug 13 '23 14:08 fwcd