chevrotain icon indicating copy to clipboard operation
chevrotain copied to clipboard

Custom Lexer Adapters

Open bd82 opened this issue 7 years ago • 7 comments

  • [X] Interface.
  • [X] Implementation for default TokenVector based Adapter.
  • [ ] JS Docs.
  • [x] Performance check --> Regressions.
  • [ ] Examples:
    • Just In Time Lexing (without creating a Token Vector first - saving memory...).
    • Async Lexing (using Atomics.wait?)
  • [ ] Investigate alternative using only method overrides.

bd82 avatar Jul 04 '17 19:07 bd82

Performance seems virtually the same (±1%). So alles gute. 😀

bd82 avatar Jul 04 '17 19:07 bd82

Farther testing has shown a small performance regression of 3-4% for the full flow scenario (lexing + parsing).

Need to update the benchmark to support a parser only scenario. To measure the actual impact on parsing speed.

bd82 avatar Jul 08 '17 14:07 bd82

benchmarking (on Chrome 59) only the parsing part showed a 12-13% regression for the JSON grammar, although the CSS grammar only showed only 3% regression.

It may be that being extra generic is not worth it considering the performance regression. While there may be a way to mitigate the performance issue while retaining the customizability. This may lead to too much complexity.

Currently there is only one productive relevant scenario that requires this adapter. This is the ECMAScript grammar example #521, in which the parser's performance is very important. Perhaps the custom behavior for the ECMAScript grammar can be achieved using plain method overrides without official APIs and dependency injection mechanisms.

bd82 avatar Jul 08 '17 15:07 bd82

Closing this.

See above comments for details. https://github.com/SAP/chevrotain/commit/194c782def5142052dfc8c5189e54ca266b30120 Contains a small POC to deal with limited LexerLess mode which should suffice for ECMAScript parsing. (even though it will not be an "official" API).

bd82 avatar Jul 24 '17 19:07 bd82

Need to investigate this again as such an adapter may also be used to conserve memory and implement token channels.

bd82 avatar Jul 21 '18 00:07 bd82

Additional testing has showed a large strange regression in the CSS benchmark (x3/x4 slower). This is probably due to some de-optimization happening in V8 internals.

The other performance scenarios (JSON/ECMA5) showed a slight (3%) regression which is expected.

I will once again close this, as at the end of the day this abstraction is just a convenience to help override certain parser internals, its not mandatory to accomplish those unique flows, rather is it only mandatory to make those unique flows supports as official apis.

Perhaps such unique flows (e.g token channels) should be provided as examples for extending the parser rather than official apis.

bd82 avatar Sep 11 '18 18:09 bd82

Refactoring the parser to multiple traits may make it easier to extract the Lexer Adapter API and provide a custom implementation without suffering a performance penalty.

This should be evaluated again.

bd82 avatar Oct 15 '18 14:10 bd82