ruby-lsp icon indicating copy to clipboard operation
ruby-lsp copied to clipboard

Explore caching index for gems

Open vinistock opened this issue 2 years ago • 5 comments

Caching the index for gems is easier than caching the entire index because we know the files will only change if the version of the gem has changed.

I think we can get a considerable performance improvement on indexing if we cache the entries for gems. Something like

.ruby-lsp/
  Gemfile
  Gemfile.lock
  cache/
    rails-7.1.0
    yarp-0.12.0

Where each gem cache is basically a Marshal dump of the index. This may require defining a way to merge different indices.

For the long term, it would be amazing if rubygems could generate the index cache during packaging, so that all gems are exported with an index by default. By doing the work ahead of time, we would be guaranteed to always have cached indices for all gems, significantly speeding up indexing.

### Tasks
- [ ] https://github.com/Shopify/ruby-lsp/issues/1919
- [ ] Cache indexed gems

vinistock avatar Sep 14 '23 15:09 vinistock

Could a Bundler plugin be an alternative? There's an after-install hook which runs after each gem is installed.

https://bundler.io/guides/bundler_plugins.html

andyw8 avatar Sep 14 '23 18:09 andyw8

We would depend on every gem maintainer to use the bundler plugin, which wouldn't really scale.

vinistock avatar Sep 14 '23 18:09 vinistock

It could be distributed as a gem which Ruby LSP installs in the .ruby-lsp bundle.

andyw8 avatar Sep 14 '23 18:09 andyw8

Would bundler pick up the plugin automatically? Even from a different Gemfile?

vinistock avatar Sep 14 '23 19:09 vinistock

Overview of Progress

I'm wrapping up my work on this issue, so I'm leaving some context here for whoever decides to take it on.

There were 2 main components to this task:

  1. Serializing entries (#1919)
  2. Implementing caching for gems based on this serialization.

The progress for both tasks is available on the serialize-entries branch. See below for more context.

Serializing Entries

We opted to use custom JSON to serialize entries instead of something like Marshal, for performance reasons. When using JSON vs. Marshal, we found a 15x improvement in serialization and deserialization time.

entries.rb and location.rb contain the serialization logic for RubyIndexer::Entry and RubyIndexer::Location objects respectively. The test files test the serialization and deserialization for all entry possibilities. Note that the test files make use of the == methods defined in entries.rb and location.rb.

Next Steps

The serialization logic is ready to ship.

Implementing Caching

The relevant code here is in index.rb. The export_to_cache method serializes the appropriate entries. An important step to verify here is that we are extracting the correct set of entries (see the embedded TODO). The import_from_cache method deserializes the entries.

In test_caching.rb, we compare the work of manually indexing non-default gems vs. importing them from the cache. This is not intended to be shipped, but merely to illustrate the benefit of caching. Here, we see a 43% reduction in indexing time.

Next Steps

  • Ensure the caching methods cover any edge cases (such as Gemfiles that point to remote repos, as specified in the TODO in index.rb).
  • Integrate the caching methods into our indexing process. When indexing a non-default gem, we should check the cache before indexing it. When the LSP shuts down, we should write to the cache.
  • Add tests for the caching.

TL;DR

Progress has been made on serializing entries and implementing caching for gems, with the latter showing a 43% reduction in indexing time. The serialization logic is ready to ship, and the remaining tasks involve addressing potential edge cases when caching, integrating caching into the indexing workflow, and adding tests for the caching. Code is available on the serialize-entries branch.

aryan-soni avatar Apr 26 '24 16:04 aryan-soni