tools icon indicating copy to clipboard operation
tools copied to clipboard

Feat/json lexer

Open IWANABETHATGUY opened this issue 2 years ago • 1 comments

Summary

  1. Add new crate rome_json_parser
  2. Implementing json_lexer with logos

Test Plan

  1. Copy lexing test case from https://github.com/rome/tools/blob/archived-js/internal/codec-config/json/parser-test262.test.ts

IWANABETHATGUY avatar Jun 27 '22 16:06 IWANABETHATGUY

Why did I introduce the logos crate ?

  1. Writing a lexer is boring and not easy, especially for a fast lexer.
  2. See https://github.com/maciejhirsz/logos#logos
    • JSON lexer has no Ambiguity, and need not to re lexing in different contexts like Javascript.
    • Using regex to describe lexing is relatively easy.
    • In such a scenario, we can't write a lexer faster than logos (Very likely)
    • Saving more time to push forward JSON parser.

IWANABETHATGUY avatar Jun 27 '22 16:06 IWANABETHATGUY

What's the total size of these new dependencies and the impact on build size?

How does the new lexer raise unicode and escape sequence errors? How flexible is logos when it comes to error recovery?

MichaReiser avatar Aug 22 '22 15:08 MichaReiser

What's the total size of these new dependencies and the impact on build size?

How does the new lexer raise unicode and escape sequence errors? How flexible is logos when it comes to error recovery?

How could I inspect the bundle size? since the rome_bin has not dependent rome_json_parser yet

IWANABETHATGUY avatar Aug 22 '22 15:08 IWANABETHATGUY

What's the total size of these new dependencies and the impact on build size?

How does the new lexer raise unicode and escape sequence errors? How flexible is logos when it comes to error recovery?

This may be a temporary solution. logos provide a basic error recovery strategy, it marks the first invalid char as an Error token, and tries to relexing from next char, until EOF. We will eventually replace it with a handwritten lexer. what do you think?

IWANABETHATGUY avatar Aug 22 '22 15:08 IWANABETHATGUY

How could I inspect the bundle size? since the rome_bin has not dependent rome_json_parser yet

What's the size of the new crate on its own? What's the size of its dependencies?

logos provide a basic error recovery strategy, it marks the first invalid char as an Error token, and tries to relexing from next char, until EOF. We will eventually replace it with a handwritten lexer. what do you think?

My main question is if the lexer of this PR implements the full JSON spec or if there are things massing, e.g. unicode escape sequences, and if so, does logos support adding the missing functionality?

We will eventually replace it with a handwritten lexer. what do you think?

I think if that's the ultimate goal then I would recommend copying our existing JavaScript lexer and strip out everything that isn't needed instead. This should be straightforward and already gives us the manual lexer. We can remove duplication in later PRs (e.g. extract a Source type) if we think this is valuable.

My main concern is that contributors now need to understand two lexer that fundamentally are different in design which makes it harder to contribute, review PRs, etc.

MichaReiser avatar Aug 23 '22 06:08 MichaReiser

make sense.

IWANABETHATGUY avatar Aug 23 '22 06:08 IWANABETHATGUY