Possible to cache the index/state for big projects?
Let me start by thanking the great work you've been doing 🌟
From my tests it has been working super well for "regular-sized" projects.
At work we have a big monolith with more than ~3M lines of Elixir code, and all the language servers we tried more or less worked but:
- It took a lot of time to index everything, which meant developers had to wait some time before they could use the LSP tools.
- This also used a lot of CPU and RAM, I assume due to the full build the language server needs to do
- The LSP tools stopped working after a while, usually after some
pullsfrommasterwhen a lot of changes were pulled (we have >100 developers working on the same code base).
To improve the developer experience for these big projects, would it be possible to cache the state/index of the language server, so that when pulling a given version of the project we would also download the corresponding Expert index and all the LSP tools would work immediately?
For context, with a 32 cores desktop + 64GB ram it took ~45minutes for Expert to finish indexing (I've monitored btop and all the cores being used during indexing, until they became mostly idle); folks with Mac laptops are mentioning letting the index run for 3-4 hours before it was usable.
Happy to provide more detailed info/timings; is there a way to collect this kind of telemetry from Expert?
If you already have an idea on how this caching mechanism could fit in the Expert architecture I can try to take a stab at it.
Once more, super thank you for this 🙏
The indexes are stored in the .expert/indexes folder, are you able to verify if these folder/files aren't being created?
Also, can you check your logs/your editor logs to see if it's indexing that is running?
Sorry for the late reply.
Answering your questions:
The indexes are stored in the .expert/indexes folder, are you able to verify if these folder/files aren't being created?
Yup they are.
Also, can you check your logs/your editor logs to see if it's indexing that is running?
Yup it seems so.
More details
Yesterday I've:
- Nuked the
.expertfolder; - Made sure
mix compile && MIX_ENV=test mix compilewere executed beforehand; - Opened
neovimthat is configured to use Expert and opened themix.exsfile (around 11:03pm).
I've also tailed the expert.log and project.log:
Details
After the Expert initialization, I start to see sent notification server -> $/progress messages on the expert.log file.
I made sure nothing else was happening on this machine (32 cores 9950X CPU, 64 GB RAM) afterwards.
During the "indexing", for ~24 minutes I see the $/progress logs but CPU usage remains low until it appears that Compiled tiger in 1264.6 seconds on the expert.log message:
After this moment, I see CPU usage across all cores increasing and some Could not expand alias errors on the project.log:
The sent notification server -> client $/progress messages on the expert.log continue to appear until 11:54pm. When they stop, most of the CPU cores go to idle:
I think it's at this stage that I consider Expert to be done.
Given the long time for Expert to finish, I was thinking whether we could share the .expert folder for a given commit, so that folks would immediately have Expert ready without "paying" the CPU/time cost. Currently the index is 3.8G:
andre@andre-jupiter ~/projs/remote/tiger/.expert/indexes/ets/27.3.3/1.18.3/3 *
❯ l
total 3.8G
drwxrwxr-x 2 andre andre 4.0K Sep 2 23:58 .
drwxrwxr-x 3 andre andre 4.0K Sep 2 23:24 ..
-rw-rw-r-- 1 andre andre 3.8G Sep 3 00:00 07368779273278058496.checkpoint
-rw-rw-r-- 1 andre andre 43K Sep 3 00:02 updates.wal
ℹ️ A .zip with the entire .expert folder without compression is 4.3G, with max compression becomes 554M.
Ask
Having a way to "execute" Expert via CLI so that it indexes a snapshot of the code and then stops would be awesome, since we could use it to build the Expert "cache" on a CI pipeline and then create some tooling to fetch it with a given commit.
Thank you!