tools
tools copied to clipboard
Feat/json lexer
Summary
- Add new crate
rome_json_parser
- Implementing
json_lexer
with logos
Test Plan
- Copy lexing test case from https://github.com/rome/tools/blob/archived-js/internal/codec-config/json/parser-test262.test.ts
Why did I introduce the logos
crate ?
- Writing a lexer is boring and not easy, especially for a fast lexer.
- See https://github.com/maciejhirsz/logos#logos
- JSON lexer has no
Ambiguity
, and need not to re lexing in different contexts likeJavascript
. - Using regex to describe lexing is relatively easy.
- In such a scenario, we can't write a lexer faster than
logos
(Very likely) - Saving more time to push forward JSON parser.
- JSON lexer has no
What's the total size of these new dependencies and the impact on build size?
How does the new lexer raise unicode and escape sequence errors? How flexible is logos when it comes to error recovery?
What's the total size of these new dependencies and the impact on build size?
How does the new lexer raise unicode and escape sequence errors? How flexible is logos when it comes to error recovery?
How could I inspect the bundle size? since the rome_bin
has not dependent rome_json_parser
yet
What's the total size of these new dependencies and the impact on build size?
How does the new lexer raise unicode and escape sequence errors? How flexible is logos when it comes to error recovery?
This may be a temporary solution.
logos
provide a basic error recovery strategy, it marks the first invalid char
as an Error
token, and tries to relexing from next char, until EOF
. We will eventually replace it with a handwritten lexer. what do you think?
How could I inspect the bundle size? since the rome_bin has not dependent rome_json_parser yet
What's the size of the new crate on its own? What's the size of its dependencies?
logos provide a basic error recovery strategy, it marks the first invalid char as an Error token, and tries to relexing from next char, until EOF. We will eventually replace it with a handwritten lexer. what do you think?
My main question is if the lexer of this PR implements the full JSON spec or if there are things massing, e.g. unicode escape sequences, and if so, does logos support adding the missing functionality?
We will eventually replace it with a handwritten lexer. what do you think?
I think if that's the ultimate goal then I would recommend copying our existing JavaScript lexer and strip out everything that isn't needed instead. This should be straightforward and already gives us the manual lexer. We can remove duplication in later PRs (e.g. extract a Source
type) if we think this is valuable.
My main concern is that contributors now need to understand two lexer that fundamentally are different in design which makes it harder to contribute, review PRs, etc.
make sense.