rust-analyzer icon indicating copy to clipboard operation
rust-analyzer copied to clipboard

[Feature Request]: Persistent caches

Open montekki opened this issue 5 years ago • 28 comments

One of the latest posts states:

One of the current peculiarities of rust-analyzer is that it doesn’t persist caches to disk. Opening project in rust-analyzer means waiting a dozen seconds while we process standard library and dependencies.

And while that may be true when working with IDEs like VScode that are launched once and used for a long period of time, other workflows that involve editors like vim and closing/opening lots of windows that don't share rust-analyzers state between each other is actually quite painful. Each time a new window is opened one has to sit and wait and for large repos that is quite a long period.

P.S. Sorry if it's a duplicate

montekki avatar Jun 02 '20 19:06 montekki

As far as I understand @matklad doesn't want to do this yet as it would reduce the necessity of optimizing the initial analysis, thus reducing the likelihood that people will work on reducing the initial analysis time. There has been some discussion about the specific use case of closing and re-opening vim often, but so far nothing has changed.

bjorn3 avatar Jun 02 '20 19:06 bjorn3

It could be almost as good if rust-analyzer could be left running and shared between editing sessions.

tjkirch avatar Oct 13 '20 21:10 tjkirch

I think that would be something the client needs to do. There have also already been enough complaints about rust-analyzer keeping running after the client exits because of bugs.

bjorn3 avatar Oct 14 '20 05:10 bjorn3

I think that would be something the client needs to do.

Perhaps not necessarily. The current lifetime of the rust-analyzer process is tied to an editing session. Instead, I could imagine the analysis being split off and done in a longer-lived process that the session process communicates with. The longer-lived process would need to handle concurrent access and perhaps purge data after a time, but it would remove the startup cost for later editing sessions and reduce the memory usage for multiple editors.

tjkirch avatar Oct 14 '20 14:10 tjkirch

Who would manage that longer-lived process if the client doesn't? If nobody does, it will keep running forever, which is bad. If the language server started by the client does, it would exit as soon as all language servers would exit, which makes it useless for closing and re-opening vim.

bjorn3 avatar Oct 14 '20 16:10 bjorn3

One could imagine a scheme where rust-analyzer checks for a running server and forks and daemonizes if one isn't running yet, maybe shutting itself down automatically if there aren't any clients after a while. flow does something similar, for example. But I feel that's far too much complexity to fix a problem that basically only exists for vim users used to a certain workflow, to be honest.

flodiebold avatar Oct 14 '20 16:10 flodiebold

We definitely won't implement persistent process withing rust-analyzer itself -- it is indeed a job for the editor. However, I think for editors like vim someone could write a separate rust-binary, rust-analyzer-supervisor, which would cache the connection to the ra.

matklad avatar Oct 14 '20 16:10 matklad

Instead of a daemon process, wouldn't it be simpler to cache index results on disk? I know clangd does this.

rwols avatar Jun 26 '21 18:06 rwols

Persistent caches will require changes to salsa to be able to serialize it's cache. In addition it has the disadvantage that it makes optimizing the initial analysis less important, which may over time result in not just regressions of the initial analysis time, but also when performing a change.

Salsa's issue for serializable caches: https://github.com/salsa-rs/salsa/issues/10

Also an observation from https://github.com/rust-lang/rfcs/pull/1317#issuecomment-150965895:

My tips:

  • [...]
  • Don't store anything to disk. It's likely the oracle can be fast enough without doing this; and unnecessary complexity creates bugs. "Have you tried deleting the .ncb file?" (I remember having to do this a couple times per day when using VS, ca. 2005)
  • [...]

Note: I am not saying that persistent caches shouldn't every be implemented. I just think that it shouldn't be implemented yet.

bjorn3 avatar Jun 26 '21 18:06 bjorn3

I remember having to do this a couple times per day when using VS, ca. 2005

To be fair, I think they got their stuff together when they moved to a database in... some previous decade.

lnicola avatar Jun 29 '21 07:06 lnicola

Thinking about this, there seems to be the following options here:

  • implement on-disk persistence as a memory usage optimization: spill large data to disk
  • implement on-disk persistence as startup-time optimization: save salsa query graph to disk when exiting, and reuse it upon restart (ideally, validation would be equivalent to calling set and figuring out that nothing has changed. That is, we could use old crate graph even we haven't completely validated it)
  • implement on-disk persistence as a build-system-aware way to use precompiled libraries. That is, in the distributed build scenario, rust-analyzer would ask the build-system to provide precompiled artifacts for dependencies.

The last case is the hardest, and the most interesting one.

It is hard because it makes persistence a public API: on disk data is no-longer a private impl detail, but a shared state between rust-analyzer and the build-system. It is another input, like file text or procedural macros.

It is the most interesting, because it makes rust-analyzer scale: it becomes possible to distribute the computation of such pre-analyzer libraries across several machines and to put the results into a distributed cache, re-used by many instances of rust-analyzer.

It seems we want the following litmus test for implementing persistence: the on-disk cache can be computed by a different machine (which runs a different OS) and be used locally.

Implementation wise, it's pretty clear that the cache should be computed on per-crate granularity. Some less-obvious questions:

  • should rust-analyzer use the cache as is (mmap it basiccally), or should it parse it into salsa's internal data structures?
  • should the cache be a separate flavor of input, or a way to cache existing inputs? Would we have code paths like if has_cache { from_cache } else { from_source } or would that be a unified code path
  • can be just store everything in cache? We can store, eg, original source files, which makes the same code-path logic work.

matklad avatar Aug 16 '21 10:08 matklad

I'd note that the file format for case 3 doesn't need to be the same as our internal cache format for cases 1 and 2 -- we could have e.g. rlibs as a possible input while caching the salsa database in a different format.

flodiebold avatar Aug 16 '21 10:08 flodiebold

should rust-analyzer use the cache as is (mmap it basiccally), or should it parse it into salsa's internal data structures?

I think it'd be super interesting to use rkyv and mmap it, but maybe it's overengineering :sweat_smile:

flodiebold avatar Aug 16 '21 11:08 flodiebold

Some additional questions:

  • how nested is the data we're now storing in salsa? Table/relational data is easier to work with, but e.g. syntax trees will pose a problem.
  • how many salsa queries are we doing during a request, as an order of magnitude? Tens of thousands might require some smart caching.
  • would it be feasible (long-term) to hook into the salsa storage mechanism?
  • can we replace it completely, or do we store the same data both in salsa and on disk?

lnicola avatar Aug 16 '21 11:08 lnicola

It is hard because it makes persistence a public API: on disk data is no-longer a private impl detail, but a shared state between rust-analyzer and the build-system. It is another input, like file text or procedural macros.

There are lots of different compatibility contracts, such as "cache inputs are best effort. If rust-analyzer can't use it, it will recompute everything from scratch". That would also heavily imply not just doing mmap and trusting it to be correct, but validating the cache and failing back to the cold-cache path if it isn't compatible.

So concretely, in the build-system scenario, the inputs to each crate would be like:

rust-analyzer cache crate_b/src --existing-cache caches/crate_b_cache --is-lib=true > caches/crate_b_cache

deontologician avatar Aug 25 '21 16:08 deontologician

Can we have an option to toggle(ON/OFF) sync, to turn off "fetching & caching" when an editor is opened while the cached data can be used from Memory or Disk, while it cached for the first time?? If the user wished the toggle to be enabled, let them have a persistence for the number of .rs files he/she has opened exceeded when he/she triggers it more than thrice or any number, and then it would sync automatically! (Setting a limit for the number of open files to trigger the sync! Default: "nolimit" )

There might be no use of daemon to run all the time!

Or Something simulate to "Android Project Treble" Like! (For Stability + Consistency)

ram19890 avatar Sep 06 '21 13:09 ram19890

@ram19890 there's no persistent caching at all right now. If and when it is implemented, it's going to be possible to delete the persistent cache, but it's too early to tell if the cache is going to be optional or not.

There's also no daemon at all. That was a suggestion for Vim users who keep closing their editor and don't want to change their workflow. A daemon like that could probably be implemented outside of RA, but it's not the real solution.

lnicola avatar Sep 06 '21 13:09 lnicola

Couple of thoughts here:

  • one unusual use-case here is that some people use .rlib as a way to distribute proprietary code. Such use-cases currently can't benefit from rust-analyzer (no source code available), but they could in theory use our own index format (if we actually erase method bodies)
  • there's a certain charm in just using rlibs -- that makes plugging rust-analyzer into existing build system easier. It's also true that rlibs are an end-game here -- it would be silly if compiler and IDE needed two separate "compiled library" formats. But using rlibs makes a deliberately unstable part of rust somewhat more stable, and there will be extra uncertainty as to who should be the emmitter of rlibs -- compiler or rustc. We'll also want to put extra things in rlibs (parmeter names), so :shrug:

matklad avatar Sep 14 '21 09:09 matklad

one unusual use-case here is that some people use .rlib as a way to distribute proprietary code.

rlibs leak filenames, doc comments for private items, the position of every item in the source file, the name of every function and type even if private and much more. I wouldn't be surprised if you could decompile them to something reasonably resembling the original source without a terribly huge amount of effort.

bjorn3 avatar Sep 14 '21 09:09 bjorn3

parmeter names

-Zalways-encode-mir and the MIR local debuginfo got you covered.

bjorn3 avatar Sep 14 '21 09:09 bjorn3

I think that people who want to distribute closed-source libraries would be better served by going through a C API. It's more work and it's boring, but you get interop with other every language under the Sun.

lnicola avatar Sep 14 '21 09:09 lnicola

We definitely won't implement persistent process withing rust-analyzer itself -- it is indeed a job for the editor. However, I think for editors like vim someone could write a separate rust-binary, rust-analyzer-supervisor, which would cache the connection to the ra.

I've written something like this, it's a binary that replaces rust-analyzer in your editor and pipes the input/output through a local tcp socket to a server which persists one rust-analyzer instance per workspace and works around LSP limitations to keep the important functionality while supporting multiple clients (vim editor instances) on a single rust-analyzer instance, it also persists the rust-analyzer process for a while when all clients are closed until a timeout runs out.

Repo is here https://github.com/pr2502/ra-multiplex, it's still work in progress but it is usable for me with neovim and coc-rust-analyzer.

Pinging users who asked for a feature like this, sorry for spam if you don't need it anymore @montekki @tjkirch @flodiebold

pr2502 avatar Jan 29 '22 18:01 pr2502

@pr2502 Can't tell you how much I appreciate https://github.com/pr2502/ra-multiplex, it works great, and having a dedicated terminal with rust-analyzer debug messages is an added bonus.

Just to confirm for anyone using helix that stumbles on this thread you can put this in your ~/.config/helix/langauges.toml:

[[language]]
name = "rust"
...
language-server = { command = "ra-multiplex" }

This is a really straightforward and a fantastic feature, worthy of consideration in adding it as a subcommand to rust-analyzer imo.

jackos avatar Jun 02 '22 08:06 jackos

I've written something like this, it's a binary that replaces rust-analyzer in your editor and pipes the input/output through a local tcp socket

Thanks a lot, it seems to also work well with the VSCode extension:

{
"rust-analyzer.server.path": "/Users/user/.cargo/bin/ra-multiplex"
}

melMass avatar Mar 07 '23 18:03 melMass

Hi, are we still looking into this or we're using ra-multiplex to wrap RA now?

akurniawan avatar Jan 18 '24 14:01 akurniawan

I think persistent caches for rust-analyzer are still a nice-to-have that require a lot of design work before they're implemented. In the meantime, I recommend using ra-multiplex.

davidbarsky avatar Jan 19 '24 21:01 davidbarsky

@pr2502, do you think ra-multiplexer would help me with keeping the indexing cache of large repositories that hold thousands of crates (i.e. projects that use polkadot-sdk) on VSCode?

I know it's a weird question, but I've been looking a solution for weeks as the number of deps in the project just keeps increasing, and it's hard enough not having a good solution to keep the cache running longer, especially when sometimes I'm opening multiple editor windows at the same time and multiple files in the same editor window.

pandres95 avatar Feb 01 '24 16:02 pandres95

it does work with vscode but it'll also make your life harder in other ways, the file watch events from clients (editors) are not propagated (yet), which means you have to manually reload the workspace (ra-multiplex reload) when adding/removing files or when Cargo.lock changes. you might end up with even more load if you change your project structure often.

pr2502 avatar Feb 02 '24 12:02 pr2502

Hi there! Dioxus is starting to become a very large project (74918 lines of rust) and I'm hitting egregiously long indexing times - 2-3 minutes on startup.

What's worse is that when we edit things like our macros and build scripts the entire workspace gets re-indexed which is a terrible experience.

We would be happy to help support (manpower or financially) any effort to get persistent caching and lazy cargo check by evaluating the crate graph and cross-referencing it against open files.

Also, the fact that RA re-uses the same build profile as a regular cargo check is quite bad as your editor and your terminal are constantly fighting for lock on the target directory which makes developing rust feel much slower than it should.

jkelleyrtp avatar Mar 13 '25 17:03 jkelleyrtp

@jkelleyrtp until it's implemented, try disabling cache priming and see how it feels. There's also an option for a separate target directory, but in my opinion it's counterproductive. And I think we already have an option to check only the modified crates and not the entire workspace.

But of course, your ideal r-a setup might not be the same as mine.

lnicola avatar Mar 13 '25 17:03 lnicola