chevrotain
chevrotain copied to clipboard
Custom Lexer Adapters
- [X] Interface.
- [X] Implementation for default TokenVector based Adapter.
- [ ] JS Docs.
- [x] Performance check --> Regressions.
- [ ] Examples:
- Just In Time Lexing (without creating a Token Vector first - saving memory...).
- Async Lexing (using Atomics.wait?)
- [ ] Investigate alternative using only method overrides.
Performance seems virtually the same (±1%). So alles gute. 😀
Farther testing has shown a small performance regression of 3-4% for the full flow scenario (lexing + parsing).
Need to update the benchmark to support a parser only scenario. To measure the actual impact on parsing speed.
benchmarking (on Chrome 59) only the parsing part showed a 12-13% regression for the JSON grammar, although the CSS grammar only showed only 3% regression.
It may be that being extra generic is not worth it considering the performance regression. While there may be a way to mitigate the performance issue while retaining the customizability. This may lead to too much complexity.
Currently there is only one productive relevant scenario that requires this adapter. This is the ECMAScript grammar example #521, in which the parser's performance is very important. Perhaps the custom behavior for the ECMAScript grammar can be achieved using plain method overrides without official APIs and dependency injection mechanisms.
Closing this.
See above comments for details. https://github.com/SAP/chevrotain/commit/194c782def5142052dfc8c5189e54ca266b30120 Contains a small POC to deal with limited LexerLess mode which should suffice for ECMAScript parsing. (even though it will not be an "official" API).
Need to investigate this again as such an adapter may also be used to conserve memory and implement token channels.
Additional testing has showed a large strange regression in the CSS benchmark (x3/x4 slower). This is probably due to some de-optimization happening in V8 internals.
The other performance scenarios (JSON/ECMA5) showed a slight (3%) regression which is expected.
I will once again close this, as at the end of the day this abstraction is just a convenience to help override certain parser internals, its not mandatory to accomplish those unique flows, rather is it only mandatory to make those unique flows supports as official apis.
Perhaps such unique flows (e.g token channels) should be provided as examples for extending the parser rather than official apis.
Refactoring the parser to multiple traits may make it easier to extract the Lexer Adapter API and provide a custom implementation without suffering a performance penalty.
This should be evaluated again.