Explore caching index for gems
Caching the index for gems is easier than caching the entire index because we know the files will only change if the version of the gem has changed.
I think we can get a considerable performance improvement on indexing if we cache the entries for gems. Something like
.ruby-lsp/
Gemfile
Gemfile.lock
cache/
rails-7.1.0
yarp-0.12.0
Where each gem cache is basically a Marshal dump of the index. This may require defining a way to merge different indices.
For the long term, it would be amazing if rubygems could generate the index cache during packaging, so that all gems are exported with an index by default. By doing the work ahead of time, we would be guaranteed to always have cached indices for all gems, significantly speeding up indexing.
### Tasks
- [ ] https://github.com/Shopify/ruby-lsp/issues/1919
- [ ] Cache indexed gems
Could a Bundler plugin be an alternative? There's an after-install hook which runs after each gem is installed.
https://bundler.io/guides/bundler_plugins.html
We would depend on every gem maintainer to use the bundler plugin, which wouldn't really scale.
It could be distributed as a gem which Ruby LSP installs in the .ruby-lsp bundle.
Would bundler pick up the plugin automatically? Even from a different Gemfile?
Overview of Progress
I'm wrapping up my work on this issue, so I'm leaving some context here for whoever decides to take it on.
There were 2 main components to this task:
- Serializing entries (#1919)
- Implementing caching for gems based on this serialization.
The progress for both tasks is available on the serialize-entries branch. See below for more context.
Serializing Entries
We opted to use custom JSON to serialize entries instead of something like Marshal, for performance reasons. When using JSON vs. Marshal, we found a 15x improvement in serialization and deserialization time.
entries.rb and location.rb contain the serialization logic for RubyIndexer::Entry and RubyIndexer::Location objects respectively. The test files test the serialization and deserialization for all entry possibilities. Note that the test files make use of the == methods defined in entries.rb and location.rb.
Next Steps
The serialization logic is ready to ship.
Implementing Caching
The relevant code here is in index.rb. The export_to_cache method serializes the appropriate entries. An important step to verify here is that we are extracting the correct set of entries (see the embedded TODO). The import_from_cache method deserializes the entries.
In test_caching.rb, we compare the work of manually indexing non-default gems vs. importing them from the cache. This is not intended to be shipped, but merely to illustrate the benefit of caching. Here, we see a 43% reduction in indexing time.
Next Steps
- Ensure the caching methods cover any edge cases (such as Gemfiles that point to remote repos, as specified in the TODO in
index.rb). - Integrate the caching methods into our indexing process. When indexing a non-default gem, we should check the cache before indexing it. When the LSP shuts down, we should write to the cache.
- Add tests for the caching.
TL;DR
Progress has been made on serializing entries and implementing caching for gems, with the latter showing a 43% reduction in indexing time. The serialization logic is ready to ship, and the remaining tasks involve addressing potential edge cases when caching, integrating caching into the indexing workflow, and adding tests for the caching. Code is available on the serialize-entries branch.