zed RFC: Enforce `Point` and `Offset` invariants at the type level

Check for existing issues

[X] Completed

Describe the feature

Offsets and points are used extensively throughout Zed. Each instance of these implicitly has some "space" they are associated with. For example, any given offset may be a MultiBuffer offset, a Buffer offset, a Display offset, and potentially others.

This has a few drawbacks:

Not self-documenting When first diving into the codebase, I'd see many functions asking for offset: usize or point: Point, and at first it was quite hard understanding which text space they were asking for.
Invariants aren't enforced by the compiler, leading to subtle bugs. I ran into this working on my (WIP) Vim Arguments Text Object pull request. I'm using the syntax tree to select the correct argument node, and converted the syntax offsets to DisplayPoints. The syntax offset is relative to the buffer, so this was a bug. These bugs can be subtle, as the normal use case is a singleton buffer. New contributors may not think to test in a MultiBuffer, or to test with inlay hints, folds, and other DisplayBuffer features.
Leads to code duplication We have two versions of ToOffset and ToPoint and ToPointUtf16. We also have DisplayPoint, InlayPoint, InlayOffset, FoldPoint, WrapPoint, and offset variants of some of these. Many of these have duplicated implementations of traits such as sum_tree::Dimension.
No single way to convert between spaces There's inconsistency in converting between different spaces, with methods scattered across types and spaces. Discovering the correct conversion method is non-trivial.

Proposed Solution

Introduce a new Offset type, a newtype wrapper around the current usize we are using
Make Point, Offset, and OffsetUtf16 generic over the space they are associated with, allowing us to enforce space invariants at compile time. Instead of usize and Point, we'd have Offset<Buffer>, Point<Buffer>, Offset<MultiBuffer>, Point<MultiBuffer>, etc. Instead of DisplayPoint and InlayPoint, we'd use Point<DisplayMap> and Point<InlayMap>.
Make ToOffset and ToPoint generic over the space they are converting to. This gives us a single mechanism for converting between spaces.

// Replaces the `ToDisplayPoint` trait, this time enforcing the space invariants at compile time, and self-documenting
impl ToPoint<DisplayMap> for Point<MultiBuffer> {
    type Context = DisplaySnapshot;
    
    fn to_point(&self, context: &Self::Context) -> Point<DisplayMap> {
        // ...
    }
}

// Replaces `MultiBufferSnapshot::point_to_buffer_offset`
impl ToOffset<Buffer> for Point<MultiBuffer> {
    type Context = MultiBufferSnapshot;
    
    fn to_offset(&self, context: &Self::Context) -> Offset<Buffer> {
        // ...
    }
}

Drawbacks

This is a relatively large refactor, although some of it should be possible to do incrementally. Luckily, this is where Rust shines!
Slightly more complex. Seeing offset: Offset<MultiBuffer> instead of offset: usize is a bit more to take in

If applicable, add mockups / screenshots to help present your vision of the feature

No response

Feb 20 '24 18:02 vultix

I think cleaning up this code is a good idea in general, and I really like the idea of making Point and DisplayPoint less confusing by having the type-name be self-descriptive. (I would also really like to add a Row<Buffer> vs Row<DisplayMap> distinction as that has bitten me several times, and unify on having a row/column field vs method...etc.).

For a significant refactor like this, we'd like you to spend some time pairing with someone at Zed to get started (so we can quickly answer the questions that come up as we get into implementation, and keep knowledge sharing high). I'd be happy to work with you on this for a bit: https://calendly.com/conradirwin/pairing, but you can also poke around in the Zed channels https://zed.dev/channel/zed-283 and find people to talk it through with.

Feb 20 '24 20:02 ConradIrwin

I took a dive into this after #8870.

A few learnings:

It is really convenient that the same units are used for the singleton buffer and the rope; so we probably want to use something like Point<Rope> or some other default Point once we get that far down the stack.
There's a lot of code that is unclear about whether its Point/usize refers to the MultiBuffer or the Buffer.

So probably the place to start is to introduce a new abstraction at the MultiBuffer layer and use it for the maps that stack on top (leaving the existing rope::Point and usize for the Buffer for now).

If we use something like Point<MultiBuffer>, Row<MultiBuffer> and Col<MultiBuffer> (and Offset<MultiBuffer>) it all seems to work, though we can't rely on #[derive] because of the type parameter, so the implementations are a bit verbose.
We probably need more concrete helpers than just the ToOffset traits (for example to go from a Point<MultiBuffer> to a rope::Point in an excerpt we subtract two Row<MultiBuffer>s and add them to a rope::Point).

Mar 05 '24 05:03 ConradIrwin

I haven’t had much time to look at this, but played around with it a bit last weekend. I came to the same conclusion - rope and buffer need to have the same coordinate space.

I successfully got the rope crate to compile with a new Offset<Rooe> type, but had to stop there. Hopefully I’ll have more time to revisit in the next weekend or so.

Mar 05 '24 05:03 vultix