cstree text: add SyntaxText::rfind

In use:

   fn span_last_line(self) -> Span {
      match self {
         SegmentRawRef::Content(content) => {
            match content.text().rfind('\n') {
               Some(len) => {
                  Span::at_end(
                     content.span().end,
                     content.text().len() - len - '\n'.len_utf8(),
                  )
               },
               None => content.span(),
            }
         },

         SegmentRawRef::Interpolation(interpolation) => {
            match interpolation.text().rfind_char('\n') { // !!
               Some(len) => {
                  Span::at_end(
                     interpolation.span().end,
                     interpolation.text().size() - len - '\n'.len_utf8(),
                  )
               },
               None => interpolation.span(),
            }
         },
      }
   }

Jul 29 '25 16:07 RGBCube

I think a new minor release is warranted? It's been a while and quite a few things have been improved

Jul 29 '25 16:07 RGBCube

I think a new minor release is warranted? It's been a while and quite a few things have been improved

Also yes, should have done that after the last round of PRs already. But I wanted to do a dependency bump with the release as well, and upgrade to the new edition, which has been due for a while, which I then didn't get to. If I still don't get to it until this is in, I'll push a release without these so you get the new APIs at least.

Jul 29 '25 17:07 domenicquirl

So, I think this would be a nice API for cstree to offer, but we might want to do it a little differently maybe... the way you've currently written the new function (the straightforward way, which is usually a good thing), this will end up calling fold_chunks, which calls tokens_with_ranges (through the try_ variant), which then materializes all descendants_with_tokens of the corresponding SyntaxNode because the method will iterate through all chunks to find the last occurrence. You might have looked at the entire tree regardless, but if you didn't, then we'll still materialize and persist it.

I think the "proper" way to implement this would be to actually have an iterator that yields descendants in reverse order, either as a separate type or as a DoubleEndedIterator (which the current implementation doesn't support, because it uses std::iter::successors). It's more effort to implement, but it optimizes for the common case where the character that you're looking to rfind actually exists close to the end of the text, in which case you might only need to materialize one linear branch of nodes. We should have the necessary prerequisites on SyntaxNode already, like last_child instead of first_child and prev_sibling_or_token (I probably wouldn't expose the iterator as public API, mostly because it would be really confusing what it does compared to a postorder traversal - on the plus side, this means there might be an easier implementation which doesn't go through a step where someone might care about descendant nodes, since for the text we only care about tokens).

That's fair, I also want to implement a chunks() that returns impl DoubleEndedIteratoe, because those methods that take a closure are quite limiting in general. I'll also base this API off of that

Jul 29 '25 17:07 RGBCube

I also want to implement a chunks() that returns impl DoubleEndedIteratoe

If you can make that work, that's gonna be so nice! If we had that, all of the _chunk[s] methods would go away because you can use the matching methods on Iterator that already exist 👀

It might be a bit tricky to implement, though, because you still have to go both up and down the nodes as well forward or back between siblings, while making sure you don't count any element twice. So I don't think this is possible without making descendants_with_tokens double-ended as well first?

Jul 29 '25 17:07 domenicquirl

@RGBCube went ahead and did all the chores in #78, you might want to rebase / start your other changes from there. There are some formatting changes that happened (and that I introduced) with the edition update, so might get some conflicts around imports, but hopefully nothing too bad.

Jul 29 '25 19:07 domenicquirl

Hey, I was going through the SyntaxNode impl, but noticed something:


impl<S: Syntax, D> Drop for SyntaxNode<S, D> {
    fn drop(&mut self) {
        // safety:: the ref count is only dropped when there are no more external references (see below)
        // and all nodes but the root have been dropped.
        // if we are the last external reference, we have not yet dropped the ref count
        // if we aren't we won't enter the `if` below
        let ref_count = unsafe { &*self.data().ref_count };
        let refs = ref_count.fetch_sub(1, Ordering::AcqRel);
        if refs == 1 {
            // drop from parent
            // NOTE regarding drop orders: since `SyntaxNode<L>::drop` looks at the `ref_count`, we
            // need to first drop the `root` and only then its `root_data` and the contained
            // `ref_count`
            let root = self.root();
            let mut root = root.clone_uncounted();
            let ref_count = root.data().ref_count;
            root.drop_recursive();
            let root_data = root.data;
            drop(root);
            unsafe { drop(Box::from_raw(root_data.as_ptr())) };
            unsafe { drop(Box::from_raw(ref_count)) };
        }
    }
}

This is all fine, and the data only gets dropped when we're the only reference left.

But there's also this:

impl<S: Syntax, D> SyntaxToken<S, D> {
    pub(super) fn new(parent: &SyntaxNode<S, D>, index: u32, offset: TextSize) -> SyntaxToken<S, D> {
        Self {
            parent: parent.clone_uncounted(),
            index,
            offset,
        }
    }

And when that SyntaxToken gets dropped, it will also trigger self.parent's drop. Which may read freed memory.

Maybe I'm misunderstanding how this works, could you explain?

Also something that struck me was the separation of children and their locks, why is that?

Jul 29 '25 21:07 RGBCube

Let's see, this is only called from SyntaxElement::new, which alternatively calls SyntaxNode::new_child when creating nodes, and that also uses clone_uncounted. The place that actually uses this is SyntaxNode::get_or_add_element, which is responsible for materializing syntax tree nodes. When it does, it then calls SyntaxNode::read to return a SyntaxElementRef, i.e., a reference to the node or token. If you want to keep that token for longer than the node is alive, you have to clone it, which then does increment the refcount (hopefully).

Basically, the ref count only counts external references to the node (that is, SyntaxNode or token copies held by the user of the library), but not internal ones, since otherwise children would always keep their parents alive because they count as using the node.

So, the token you see returned from the fn new you quoted is not actually given to the user, where it could be dropped and cause a problem, but is only used to store it inside its parent node, where all children are persisted once they are first materialized (this is one of the differences between cstree and rowan, which always re-matierializes on demand), so will never be dropped.

Jul 29 '25 23:07 domenicquirl

There was a reason for separating the locks as well, but I don't 100% remember off the top of my head. Probably some borrow checking limitations around mapping between nodes / tokens and elements and references vs. _Ref types. I know I tried to get it to work inline, but I'd have to probably try and run into the same errors again to say for sure.

If it's not that, it might also be an artefact of my attempts at solving #5. But I think none of that ever made it into the mainline code.

Jul 29 '25 23:07 domenicquirl

Probably some borrow checking limitations around mapping between nodes / tokens and elements and references vs. _Ref types.

@RGBCube I had another look with a bit more sleep and that is indeed the issue: if you put the children inside the lock, you have to return the RwLockReadGuard or something tied to it, but the guard is local to the function. You could get around this by mapping the guard to an element ref, but

that's nightly
and would force us to return RwLockReadGuard as part of the API
and is unnecessary, since we know that the child will never be modified (the lock exists only to synchronize materializing it, after which it is immutable (since cstree intentionally doesn't do in-place updates)).

Jul 30 '25 15:07 domenicquirl

I think a new minor release is warranted? It's been a while and quite a few things have been improved

Hey @RGBCube, since it's been another while I've published a new version of cstree with the previous round of changes plus the edition and dependency updates. I don't think that should impact this PR, since we can do a point release for adding the new function once it's ready. So just letting you now, since you've been requesting a release.

Sep 29 '25 19:09 domenicquirl

text: add SyntaxText::rfind_char