sonar-delphi
sonar-delphi copied to clipboard
Unify lexing/preprocessing/parsing into hand-rolled scanner
Prerequisites
- [X] This improvement has not already been suggested.
- [X] This improvement would be generally useful, not specific to my code or setup.
Engine area
Delphi language support
Improvement description
Background
- The current parsing is implemented in 3 steps:
- Lexing (
DelphiLexer
generated by ANTLR) - Preprocessing (
DelphiPreprocessor
) - Parsing (
DelphiParser
generated by ANTLR)
- Lexing (
- In addition, we have a hand-rolled
CompilerDirectiveParser
with correspondingExpressionLexer
andExpressionParser
for handling constant expressions within{$if}
and{$elseif}
directives.
Problem
While this approach keeps the concerns cleanly separated and sounds nice on paper, it doesn't reflect reality at all.
Delphi's parsing is a fairly streamlined single-pass implementation that parses high-level AST structures and constructs the symbol table while reading tokens on-demand from the input data and processing compiler directives as they appear. In other words, it's all mashed together into a single process.
There are many subtle deficiencies in the current parsing that we can trace back to this fundamental difference in how the process is modeled.
Solution
Merge the 3 parsing steps into a hand-rolled DelphiScanner
:
- Lex tokens on-demand instead of doing it upfront.
- Evaluate compiler directives as they're lexed from the input data instead of afterward.
- Additionally, perform symbol table construction simultaneously instead of afterward.
Rationale
Enable full support for:
- caret notation (#111)
- compiler directives nested in directive expressions (#260)
- symbol information in directive expressions
- references to constants
-
Declared
function
- "vexing" parses that require symbol information to resolve
// [[OPTION 1]] // In this scenario, `Write` is called with 2 arguments // parsed as binary expressions: `Foo < T` and `T > (B)` { type A = class procedure Bar; end; const Foo = 0; T = 0; B = 0; } // [[OPTION 2]] // In this scenario, `Write` is called with 1 argument // parsed as an invocation of generic function `Foo`. { type A = class function Foo<U, V>(X: Integer): Integer; procedure Bar; end; T = class end; const B = 0; } procedure A.Bar; begin Write(Foo<T, T>(B)); end;
Make it easier to implement:
- constant expression evaluation (#116)