chevrotain icon indicating copy to clipboard operation
chevrotain copied to clipboard

Custom Token Patterns - better state access

Open bd82 opened this issue 7 years ago • 1 comments

  • [ ] Expose and allow mutating location state for custom Tokens.
  • [ ] Investigate performance implications of tracking newlines using custom Tokens.
    • avoids rescanning the text.
  • [ ] Investigate wrapping the whole state (tokens/groups/location/...) in an object in terms of performance and improved API.

bd82 avatar Jun 29 '17 10:06 bd82

Background to some of this issue

The root evil is that we are re-processing the text when calculating the column/line info. It does not matter if we do this super effectively with charCodeAt or less so with a regExp. Both are a waste of time.

Perhaps if we use a custom token pattern (function instead of regExp) and somehow expose the location state of the lexer to it. for example via a location object passed as an argument and mutated. We can avoid the re-processing by implementing line/column tracking inside the custom Token pattern.

function matchWS(text, location) {
   // code using charCodeAt
   if (curCharCode === 10 /* "\n" */) {
       // this will affect state of the lexer
       location.line = location.line + 1
       location.column = 1
       wsString += curChar
   }
   // ...
   
    return [wsString]
}



let WhiteSpace = createToken({
    name: "WhiteSpace",
    pattern: matchWS
})

The open question is the overhead of using customTokens and wrapping location info in a mutated object versus inefficiency of re-processing the same data?

bd82 avatar Jun 29 '17 11:06 bd82