fluent icon indicating copy to clipboard operation
fluent copied to clipboard

Change AST to allow for zero-copy parsing

Open zbraniecki opened this issue 7 years ago • 3 comments

Zero-copy parsing can be extremely fast and brings significant memory savings. What's more, ability to zero-copy parse does not prevent the AST from taking ownership over the data allowing for the original string to be discarded and the AST to be transferred, when needed.

In order to allow for zero-copy parsing, we'll need to introduce two changes:

  • Comments will become a vector of ropes (Rust: Vec<&str>) which will omit the comment sigil and store empty lines as "")
  • Pattern::TextElement will store unescaped strings and the only escaping needed on the parser level is \{ which will not terminate the TextElement.

This would mean that we'd need a separate step to process the text element in pattern and comment when necessary. @Manishearth suggested using COW [0] to lazily resolve/process those two data structures into an owner, unescaped and processed structures when needed.

[0] https://doc.rust-lang.org/std/borrow/enum.Cow.html

zbraniecki avatar Jul 26 '18 06:07 zbraniecki

This would mean that we'd need a separate step to process the text element in pattern

Can you explain what you mean by this? In 0.6, escape sequences are already stored in their raw form. Unescaping is left up to the resolver.

Furthermore, depending on the discussion in https://github.com/projectfluent/fluent/issues/115 and https://github.com/projectfluent/fluent/issues/123, we might be able to remove some escape sequences from the grammar.

stasm avatar Jul 26 '18 06:07 stasm

I think that, high-level, zero-copy parsing isn't a design goal for fluent. I think that some implementations might benefit from it, but it does come with down-sides. I'm not so happy about having designed compare-locales with a zero-copy parser, for example.

We shouldn't change the textual representation of a Fluent message or term in support of zero-copy parsing, I think.

I also have some more questions on the big scheme:

  • How abstract is the abstract syntax tree?
  • Do Fluent parsers need to return the AST used in the reference parser? -- if so, do they need to do so directly, or are intermediate representations OK?
  • Are AST nodes data containers or classes? -- i.e., is Comment.content data or an @property (py), .content() (rs)
  • Follow-up question to that, is the AST read-only or mutable?

These questions are all around what it actually means to implement Fluent. What's normative, what's informative, what's "we just had to write something here". That's going to be more relevant as we see people-not-us implementing Fluent.

Detail implementation note: it might be beneficial to document text-specials, syntax highlighters probably want those. Comments might benefit from an internal refactor if we include semantic comments in a formal way. It's probably not totally random that these have overlap with zero-copying.

Pike avatar Jul 26 '18 09:07 Pike

We shouldn't change the textual representation of a Fluent message or term in support of zero-copy parsing, I think.

I'm not sure if I agree. In my view parsing is one of the most "popular" steps in the widest variety of operations on FTL. Some of those use cases further "use" the patterns - like runtime, others may just list IDs and peek into the pattern/value only occasionally.

Having a very robust and cheap parser benefits all of the use cases, so pushing out everything that is not necessary in parser out of parser, seems like a good design decision. Whether we should prioritize it is a different question from whether we should aim for it.

I don't have ready answers to your listed questions, and I agree that they're the right ones to ask and discuss, but I have to one of them in context of my proposal:

Follow-up question to that, is the AST read-only or mutable?

zero-copy parser makes it transparent and allows us to start with a read-only cheap semantic slicing of the source string, that can be extended and mutated lazily when needed. That's where Manish suggested COW as a common mechanism used for that kind of behavior.

zbraniecki avatar Jul 26 '18 16:07 zbraniecki