ruff
ruff copied to clipboard
[red-knot] Add "cheap" `program.snapshot`
Summary
This PR implements a snapshotting mechanism for Database and makes query cancellation a Database concept. The motivation for this change is the LSP where the main loop in the LSP uses spawn to schedule some work on its thread pool. sapwn requires that all arguments have a 'static lifetime. That means passing a &Program as we've done in the CLI main loop won't be possible.
To solve this, this PR introduces a program.snapshot method that returns an owned but read-only database instance. The LSP can safely pass the snapshot to its worker thread because it satisfies the 'static lifetime requirement.
The main challenge of the snapshot function is that we don't want to create a deep-clone of the database including the jars state because:
- a deep-clone of the caches is very expensive
- it would mean that caching is restricted to a single thread. However, we want to share a single cache between all threads.
Getting cheap "clones" is achieved by wrapping the jars state in an Arc. However, this introduces a new problem. Now, mutating is expensive because Arcs are read-only by default. Solving this, requires introducing cancellation.
The implementation makes use of the fact that mutating an Arc in place is possible if the Arc has exactly one reference. The Database now stores its own cancellation token and calling a mutation method on the storage (or database) automatically requests cancellation of all queries and waits until all other references to jars are dropped. It is then possible to safely take the mutable reference from the Arc.
Waiting is implemented using a WaitGroup. The WaitGroup is stored in the Jars storage, and its counter is incremented whenever a Snapshot is created and automatically decremented when a Snapshot drops.
The last missing piece is to ensure that queries stop "soonish" when cancellation is requested. This is achieved by changing the db.jar() method to return a QueryResult. It returns Err(QueryError::Cancelled) in case cancellation has been requested. This requires that we change the return type of each query to QueryResult because they won't have a result when they're cancelled.
Checking cancellation in the jars method has the advantage that it is automatically tested by each query that uses caching because the method is needed to retrieve the query storage. Queries have the possibility to manually test for cancellation if needed by calling db.cancelled()?, but that should only be necessary for long operations that never issue a new query.
Catch unwind
An alternative to QueryResult is to panic with a specific error and catch that error in a catch unwind boundary. This is what salsa does. I decided to use a Result instead to make cancellation more explicit. I think that it is important to be aware that a request can be cancelled because it means that we need to ensure to never write "partial" results into the cache, and if we do, have a way to undo the change in case the query gets cancelled to leave the cache in a clean state.
Test plan
I ran the linter and used touch to trigger a re-run. This PR adds a new RED_KNOT_SLOW_LINT that adds an artificial slowdown to lint_syntax for testing cancellation.
Attribution
The ideas here are heavily inspired by Salsa and sometimes applied 1:1.
ruff-ecosystem results
Linter (stable)
✅ ecosystem check detected no linter changes.
Linter (preview)
✅ ecosystem check detected no linter changes.
Formatter (stable)
✅ ecosystem check detected no format changes.
Formatter (preview)
✅ ecosystem check detected no format changes.
To be clear, the "immutable snapshot" means we won't be applying file changes or invalidating anything concurrently (which is great, because it will simplify invalidation a lot). But caching within e.g. TypeStore uses interior mutability and can still be updated in workers, because it implements its own internal locking and doesn't rely on having an exclusive reference to the db.
Yes, that's correct. The immutable snapshot still allows for caches to cache new values, but no "inputs" should change (which would change the result of the analysis). Another way to think about this.