text: add SyntaxText::rfind_char
In use:
fn span_last_line(self) -> Span {
match self {
SegmentRawRef::Content(content) => {
match content.text().rfind('\n') {
Some(len) => {
Span::at_end(
content.span().end,
content.text().len() - len - '\n'.len_utf8(),
)
},
None => content.span(),
}
},
SegmentRawRef::Interpolation(interpolation) => {
match interpolation.text().rfind_char('\n') { // !!
Some(len) => {
Span::at_end(
interpolation.span().end,
interpolation.text().size() - len - '\n'.len_utf8(),
)
},
None => interpolation.span(),
}
},
}
}
I think a new minor release is warranted? It's been a while and quite a few things have been improved
I think a new minor release is warranted? It's been a while and quite a few things have been improved
Also yes, should have done that after the last round of PRs already. But I wanted to do a dependency bump with the release as well, and upgrade to the new edition, which has been due for a while, which I then didn't get to. If I still don't get to it until this is in, I'll push a release without these so you get the new APIs at least.
So, I think this would be a nice API for
cstreeto offer, but we might want to do it a little differently maybe... the way you've currently written the new function (the straightforward way, which is usually a good thing), this will end up callingfold_chunks, which callstokens_with_ranges(through thetry_variant), which then materializes alldescendants_with_tokensof the correspondingSyntaxNodebecause the method will iterate through all chunks to find the last occurrence. You might have looked at the entire tree regardless, but if you didn't, then we'll still materialize and persist it.I think the "proper" way to implement this would be to actually have an iterator that yields descendants in reverse order, either as a separate type or as a
DoubleEndedIterator(which the current implementation doesn't support, because it usesstd::iter::successors). It's more effort to implement, but it optimizes for the common case where the character that you're looking torfindactually exists close to the end of the text, in which case you might only need to materialize one linear branch of nodes. We should have the necessary prerequisites onSyntaxNodealready, likelast_childinstead offirst_childandprev_sibling_or_token(I probably wouldn't expose the iterator as public API, mostly because it would be really confusing what it does compared to a postorder traversal - on the plus side, this means there might be an easier implementation which doesn't go through a step where someone might care about descendant nodes, since for the text we only care about tokens).
That's fair, I also want to implement a chunks() that returns impl DoubleEndedIteratoe, because those methods that take a closure are quite limiting in general. I'll also base this API off of that
I also want to implement a
chunks()that returnsimpl DoubleEndedIteratoe
If you can make that work, that's gonna be so nice! If we had that, all of the _chunk[s] methods would go away because you can use the matching methods on Iterator that already exist 👀
It might be a bit tricky to implement, though, because you still have to go both up and down the nodes as well forward or back between siblings, while making sure you don't count any element twice. So I don't think this is possible without making descendants_with_tokens double-ended as well first?
@RGBCube went ahead and did all the chores in #78, you might want to rebase / start your other changes from there. There are some formatting changes that happened (and that I introduced) with the edition update, so might get some conflicts around imports, but hopefully nothing too bad.
Hey, I was going through the SyntaxNode impl, but noticed something:
impl<S: Syntax, D> Drop for SyntaxNode<S, D> {
fn drop(&mut self) {
// safety:: the ref count is only dropped when there are no more external references (see below)
// and all nodes but the root have been dropped.
// if we are the last external reference, we have not yet dropped the ref count
// if we aren't we won't enter the `if` below
let ref_count = unsafe { &*self.data().ref_count };
let refs = ref_count.fetch_sub(1, Ordering::AcqRel);
if refs == 1 {
// drop from parent
// NOTE regarding drop orders: since `SyntaxNode<L>::drop` looks at the `ref_count`, we
// need to first drop the `root` and only then its `root_data` and the contained
// `ref_count`
let root = self.root();
let mut root = root.clone_uncounted();
let ref_count = root.data().ref_count;
root.drop_recursive();
let root_data = root.data;
drop(root);
unsafe { drop(Box::from_raw(root_data.as_ptr())) };
unsafe { drop(Box::from_raw(ref_count)) };
}
}
}
This is all fine, and the data only gets dropped when we're the only reference left.
But there's also this:
impl<S: Syntax, D> SyntaxToken<S, D> {
pub(super) fn new(parent: &SyntaxNode<S, D>, index: u32, offset: TextSize) -> SyntaxToken<S, D> {
Self {
parent: parent.clone_uncounted(),
index,
offset,
}
}
And when that SyntaxToken gets dropped, it will also trigger self.parent's drop. Which may read freed memory.
Maybe I'm misunderstanding how this works, could you explain?
Also something that struck me was the separation of children and their locks, why is that?
Let's see, this is only called from SyntaxElement::new, which alternatively calls SyntaxNode::new_child when creating nodes, and that also uses clone_uncounted. The place that actually uses this is SyntaxNode::get_or_add_element, which is responsible for materializing syntax tree nodes. When it does, it then calls SyntaxNode::read to return a SyntaxElementRef, i.e., a reference to the node or token. If you want to keep that token for longer than the node is alive, you have to clone it, which then does increment the refcount (hopefully).
Basically, the ref count only counts external references to the node (that is, SyntaxNode or token copies held by the user of the library), but not internal ones, since otherwise children would always keep their parents alive because they count as using the node.
So, the token you see returned from the fn new you quoted is not actually given to the user, where it could be dropped and cause a problem, but is only used to store it inside its parent node, where all children are persisted once they are first materialized (this is one of the differences between cstree and rowan, which always re-matierializes on demand), so will never be dropped.
There was a reason for separating the locks as well, but I don't 100% remember off the top of my head. Probably some borrow checking limitations around mapping between nodes / tokens and elements and references vs. _Ref types. I know I tried to get it to work inline, but I'd have to probably try and run into the same errors again to say for sure.
If it's not that, it might also be an artefact of my attempts at solving #5. But I think none of that ever made it into the mainline code.
Probably some borrow checking limitations around mapping between nodes / tokens and elements and references vs. _Ref types.
@RGBCube I had another look with a bit more sleep and that is indeed the issue: if you put the children inside the lock, you have to return the RwLockReadGuard or something tied to it, but the guard is local to the function. You could get around this by mapping the guard to an element ref, but
- that's nightly
- and would force us to return
RwLockReadGuardas part of the API - and is unnecessary, since we know that the child will never be modified (the lock exists only to synchronize materializing it, after which it is immutable (since
cstreeintentionally doesn't do in-place updates)).
I think a new minor release is warranted? It's been a while and quite a few things have been improved
Hey @RGBCube, since it's been another while I've published a new version of cstree with the previous round of changes plus the edition and dependency updates. I don't think that should impact this PR, since we can do a point release for adding the new function once it's ready. So just letting you now, since you've been requesting a release.