sonar-delphi icon indicating copy to clipboard operation
sonar-delphi copied to clipboard

Unify lexing/preprocessing/parsing into hand-rolled scanner

Open Cirras opened this issue 8 months ago • 0 comments

Prerequisites

  • [X] This improvement has not already been suggested.
  • [X] This improvement would be generally useful, not specific to my code or setup.

Engine area

Delphi language support

Improvement description

Background

  • The current parsing is implemented in 3 steps:
    • Lexing (DelphiLexer generated by ANTLR)
    • Preprocessing (DelphiPreprocessor)
    • Parsing (DelphiParser generated by ANTLR)
  • In addition, we have a hand-rolled CompilerDirectiveParser with corresponding ExpressionLexer and ExpressionParser for handling constant expressions within {$if} and {$elseif} directives.

Problem

While this approach keeps the concerns cleanly separated and sounds nice on paper, it doesn't reflect reality at all.

Delphi's parsing is a fairly streamlined single-pass implementation that parses high-level AST structures and constructs the symbol table while reading tokens on-demand from the input data and processing compiler directives as they appear. In other words, it's all mashed together into a single process.

There are many subtle deficiencies in the current parsing that we can trace back to this fundamental difference in how the process is modeled.

Solution

Merge the 3 parsing steps into a hand-rolled DelphiScanner:

  • Lex tokens on-demand instead of doing it upfront.
  • Evaluate compiler directives as they're lexed from the input data instead of afterward.
  • Additionally, perform symbol table construction simultaneously instead of afterward.

Rationale

Enable full support for:

  • caret notation (#111)
  • compiler directives nested in directive expressions (#260)
  • symbol information in directive expressions
    • references to constants
    • Declared function
  • "vexing" parses that require symbol information to resolve
    // [[OPTION 1]]
    // In this scenario, `Write` is called with 2 arguments
    // parsed as binary expressions: `Foo < T` and `T > (B)` 
    {
      type
        A = class
          procedure Bar;
        end;
    
      const
        Foo = 0;
        T = 0;
        B = 0;
    }
    
    // [[OPTION 2]]
    // In this scenario, `Write` is called with 1 argument
    // parsed as an invocation of generic function `Foo`.
    {  
      type
        A = class
          function Foo<U, V>(X: Integer): Integer;
          procedure Bar;
        end;
    
        T = class end;
    
      const
        B = 0;
    }
    
    procedure A.Bar;
    begin
      Write(Foo<T, T>(B));
    end;
    

Make it easier to implement:

  • constant expression evaluation (#116)

Cirras avatar Jun 18 '24 06:06 Cirras