fluent-rs Add feature to add spans tracking source positions in AST nodes

I needed a parser for .ftl files. I found tree-sitter-fluent, but for some reason it couldn't parse a valid file, throwing errors when trying to use replaceable expressions. Decided to use fluent-syntax, but why does the javascript version have node spans but the rust version does not. This PR solves this issue, but since there is usually no need for spans, I hid them behind the spans feature.

And also to avoid conflicts in tests, because there the tree is formatted, because of which the spans change, the implementation version of PartialEq for AST nodes was divided into a derive implementation and a manual one.

The good idea is to write tests to match the spans, but I'm not sure how best to do that, I need help with this.

Nov 16 '24 16:11 Ertanic

This would address #270, no?

Have you run any benchmarks with/without this feature enabled?

Nov 16 '24 19:11 alerque

This would address https://github.com/projectfluent/fluent-rs/issues/270, no?

I've been looking in only open PRs.

Have you run any benchmarks with/without this feature enabled?

bench	default	`spans` feature
construct/preferences	26.619 µs	24.673 µs
resolve/preferences	15.104 µs	14.990 µs
resolve_to_str/preferences	22.803 µs	22.902 µs
parse_ctx_runtime/preferences	300.13 µs	305.19 µs
parse_ctx_runtime/browser	120.29 µs	145.79 µs

Nov 16 '24 19:11 Ertanic

I wouldn't want to use a manual implementation of PartialEq, but I've found this to be the most optimal solution. I found several solutions on the Internet, including the derivative crate, where it is possible to ignore a particular field when using #[derive(Derivative), derivative(PartialEq)], but I didn't want to drag additional dependencies for the sake of it. Although it is much more convenient, because when you change the structure's composition, you won't have to worry about supporting manual implementation of PartialEq.

Span itself implements PartialEq so that it can be compared to others. I was looking at tree-sitter, where the range of a node is provided through the corresponding function.

And I don't quite understand your point. Are you proposing to introduce additional methods for fields to compare structures and their fields? Or to compare all fields of node structures separately from PartialEq and Eq in separate methods?

Again, it's all for the sake of passing some tests that receive one ftl as input, then serialize it into the formatted ftl format and parse it again, so you get different spans for nodes. Either change the input data of the tests, which I think is wrong, or supplement the serializer so that it builds ftl content by spans, or just separate the implementation of comparison. I don't know, I chose the easiest option, as I needed it urgently in my lsp server, and I don't have any problems with it so far.

Nov 17 '24 17:11 Ertanic

And I don't quite understand your point. Are you proposing to introduce additional methods for fields to compare structures and their fields? Or to compare all fields of node structures separately from PartialEq and Eq in separate methods?

I'm raising a concern that semantically the following code should pass:

let node1 = Node {
  value: "foo",
  span: span!(0, 4),
};

let node2 = Node {
  value: "foo",
  span: span!(5, 11),
};
assert_ne!(node1, node2);

because those two nodes are not equal. Their content is different.

Now, what is true is that in most cases we care about the actual content of the node, not its meta information. We can explicitly achieve that by doing:

let node1 = Node {
  value: "foo",
  span: span!(0, 4),
};

let node2 = Node {
  value: "foo",
  span: span!(5, 11),
};
assert_ne!(node1, node2);

// Option 1:
assert_eq!(node1.content, node2.content);

// Option 2:
assert_eq!(node1.cmp_content(&node2));

Or we can do what is proposed in the PR and add:

let node1 = Node {
  value: "foo",
  span: span!(0, 4),
};

let node2 = Node {
  value: "foo",
  span: span!(5, 11),
};
assert_eq!(node1, node2);

assert_ne!(node1.span, node2.span);
assert_eq!(node1.cmp_span(&node2));

I'm not sure what is the most common approach to AST comparisons with spans. I'd suggest checking prior art in other parser/AST/serializer models.

Nov 18 '24 06:11 zbraniecki

I apologize for missing, been a bit busy. You're right that spans are part of the object metadata, so I removed the weird manual implementation of the comparison and just pre-formatted the actual version with the correct spans that are compared in the test.

Jan 25 '25 13:01 Ertanic

Okay...

Mar 08 '25 12:03 Ertanic

I'm aware of the feature, I fixed it a long time ago, but the commit hung

Mar 11 '25 18:03 Ertanic

I don't know why github can't output my commit, but fork has it: https://github.com/projectfluent/fluent-rs/commit/460c36514d5948172766f396f8a0214be2c6dd1e

Mar 12 '25 07:03 Ertanic

Had to play around with reverting to fix the github. @zbraniecki

Mar 17 '25 10:03 Ertanic

Is there a Git attribute we need to set to make checking out the new fixtures that explicitly test carriage returns from getting smudged on checkout?

May 20 '25 23:05 alerque

I couldn't think of anything better than just disabling the documentation tests. Otherwise it's too much work to modify the documentation, which I think will only make it harder to read. From the suggestions to create a separate task for testing documentation, skipping the spans feature.

May 21 '25 07:05 Ertanic

I've tried fixing the error during the parse_fixtures_compare test, even adding spans to the json files, but it still keeps generating the error of no span field in the json file.

May 21 '25 10:05 Ertanic

Please don't do any merging into this branch. I've force pushed several times to clean up the history and you keep pushing the dirty commits back into the timeline. Do feel free to work on it if there is something to do, but before committing either git pull --rebase to work from where this branch is (or git reset --hard <COMMIT> to the lastest commit you see here) and then just commit the changes on top of that. I can selectively squash fixes into the appropriate previous commits much easier than unraveling the cruft that comes in when you merge from your own outdated history. Thanks.

May 21 '25 13:05 alerque

Somehow the new fixture layout for un-normalized tests is running afoul of this heuristic:

https://github.com/projectfluent/fluent-rs/blob/main/fluent-syntax/src/serializer.rs#L425-L429

I can actually pass tests by not adding the extra carriage return there, but I also don't see a test for the "rare edge case" that is supposed to fix in the first place, so just bypassing it doesn't seem like a good plan.

May 21 '25 15:05 alerque

I was pretty excited to get this feature landed before cutting a release, but I think I'm going to have to delay it. Hopefully we don't have to delay it long—I have no objection to doing a new release on a relatively short timeline as soon as this is actually ready.

My concerns are mostly related to testing, but I was playing around with fixing the tests and it feels like something is wrong with the scruct itself. Enabling the feature (or not) should not immediately break all existing usage of the serialization. That would make it very hard for existing systems with serialized data to migrate. If possible I would really like existing apps not to break the moment somebody tries enabling the feature. That may not be quite possible, but we should at least think through what the ramifications are and document how/why a change needs to be made to keep using existing code easily on the new version.

Until that gets thought through and tests pass both with and without the feature enabled I can't reasonably merge this, and since we have problems with old dependencies starting to hinder usage I kind of need to get an update out to address that.

Again I do want this feature to land and am willing to facilitate a release cycle special for it as soon as we're actually comfortable with the upgrade path if there are any breaking changes and tests work well enough to rely on.

May 22 '25 11:05 alerque

I do not know how to fix the roundtrip_unnormalized_fixtures test, so I suggest hiding it for now behind the cfg(not(feature = "spans")) attribute?

May 23 '25 06:05 Ertanic

fluent-rs fluent-rs copied to clipboard

Add feature to add spans tracking source positions in AST nodes

fluent-rs
fluent-rs copied to clipboard