chevrotain
chevrotain copied to clipboard
Custom Token Patterns - better state access
- [ ] Expose and allow mutating location state for custom Tokens.
- [ ] Investigate performance implications of tracking newlines using custom Tokens.
- avoids rescanning the text.
- [ ] Investigate wrapping the whole state (tokens/groups/location/...) in an object in terms of performance and improved API.
Background to some of this issue
The root evil is that we are re-processing the text when calculating the column/line info. It does not matter if we do this super effectively with charCodeAt or less so with a regExp. Both are a waste of time.
Perhaps if we use a custom token pattern (function instead of regExp) and somehow expose the location state of the lexer to it. for example via a location object passed as an argument and mutated. We can avoid the re-processing by implementing line/column tracking inside the custom Token pattern.
function matchWS(text, location) {
// code using charCodeAt
if (curCharCode === 10 /* "\n" */) {
// this will affect state of the lexer
location.line = location.line + 1
location.column = 1
wsString += curChar
}
// ...
return [wsString]
}
let WhiteSpace = createToken({
name: "WhiteSpace",
pattern: matchWS
})
The open question is the overhead of using customTokens and wrapping location info in a mutated object versus inefficiency of re-processing the same data?