DelphiAST Include Comments in the AST

Related to #38, it would be useful to include code comments within the Snytax Tree.

Jan 22 '15 11:01 LaKraven

Comments are ignored by lexer, parser doesn't have this information. But I think it is possible to implement. I'll take a look.

Jan 22 '15 12:01 RomanYankovsky

I've made the first changes for this issue in my fork, but note the TODO in the log message. https://github.com/uschuster/DelphiAST/commit/e6e7c2220ed9b62cc208f45896c4fde22e78f9d6

Mar 29 '15 18:03 uschuster

@uschuster unfortunately, that's much more complicated...

Try to parse this code and take a look at the syntax tree:

unit commenttest;

interface

var
  Int1 {MyFavoriteInt}, Int2: Integer;

implementation

procedure TestProc;
begin
  Int1 {That's my favorite int } := Int2 * {mul} 2;
end;

end.

But it is a good start. Great!

P.S. To save your time. This is the syntax tree for code above.

<?xml version="1.0"?>
<UNIT line="1" col="1" name="commenttest">
  <INTERFACE line="3" col="1">
    <VARIABLES line="5" col="1">
      <VARIABLE>
        <NAME line="6" col="3" value="Int1"/>
        <TYPE line="6" col="31" name="Integer"/>
      </VARIABLE>
      <VARIABLE>
        <NAME line="6" col="25" value="Int2"/>
        <TYPE line="6" col="31" name="Integer"/>
      </VARIABLE>
    </VARIABLES>
  </INTERFACE>
  <IMPLEMENTATION line="8" col="1">
    <METHOD line="10" col="1" name="TestProc" kind="procedure">
      <STATEMENTS end_line="13" begin_line="12" end_col="1" begin_col="3">
        <ASSIGN line="12" col="3">
          <LHS>
            <COMMENT end_line="12" type="Borland" begin_line="12" end_col="32" value="{That&apos;s my favorite int }" begin_col="8"/>
          </LHS>
          <RHS>
            <EXPRESSION line="12" col="37">
              <MUL line="12" col="42">
                <COMMENT end_line="12" type="Borland" begin_line="12" end_col="48" value="{mul}" begin_col="44"/>
                <LITERAL line="12" col="50" type="numeric" value="2"/>
              </MUL>
            </EXPRESSION>
          </RHS>
        </ASSIGN>
      </STATEMENTS>
    </METHOD>
  </IMPLEMENTATION>
</UNIT>

Apr 01 '15 20:04 RomanYankovsky

@RomanYankovsky Ah I see. I just tried to add comments into a separate child list of TSyntaxNode, but for some constructs the comments got lost. I think using a separate list for all comments and try to attach the comments at the end could be the way to go.

Apr 03 '15 09:04 uschuster

Without comments the AST is incomplete and useless for my purposes :( So the case is important. Please consider to fix it.

Sep 02 '15 07:09 barbalion

It'd be nice to access the comments for different reasons. But comments are not part of an AST. They live next to it. So there should be a separate list like @uschuster has already mentioned.

Sep 02 '15 11:09 Wosi

Why should it be a separate list? What's wrong with having nodes that represent comments inside the syntax tree? A comment is syntactically valid - its syntax is such that its contents are ignored by the compiler. But it's still valid code.

On 2 September 2015 at 13:04, Christopher Wosinski <[email protected]

wrote:

It'd be nice to access the comments for different reasons. But comments are not part of an AST. They live next to it. So there should be a separate list like @uschuster https://github.com/uschuster has already mentioned.

— Reply to this email directly or view it on GitHub https://github.com/RomanYankovsky/DelphiAST/issues/39#issuecomment-137031242 .

Sep 02 '15 11:09 vintagedave

@vintagedave may be I'm missing an idea, but can you please show me a sample correct syntax tree for code below? I just can't imaging how to do this.

unit commenttest;

interface

var
  Int1 {MyFavoriteInt}, Int2: Integer;

implementation

procedure TestProc;
begin
  Int1 {That's my favorite int } := Int2 * {mul} 2;
end;

end.

Sep 02 '15 11:09 RomanYankovsky

@uschuster worked on that, but have never done a pull request. Did he finished his effort?

Sep 02 '15 11:09 RomanYankovsky

Take a look at the example made by @RomanYankovsky. Comments can be nested everywhere. How should the AST for that code look like in your opinion? Of course you can squeeze the comments into the AST anyway but does the result look good? Is it still easy to gather information from the AST?

Let's do some extreme things.

  {comment1} MyObject{comment2}.{comment3}Prop1 //Comment4
                                         .{comment5}Method1(
                                          {comment6}6,nil{comment7}, 'hello'{comment8} + 'world' // comment9
                                          ){comment10}.SubProp1 := //comment11
                          {comment12}'value' + {comment13} + IntToStr(14{comment14});

If you want to represent this valid code in a comment including AST you should change the whole structure. I think if you put comments into the abstract syntax tree you would end up in a concrete syntax tree which is much harder to get information from.

Sep 02 '15 11:09 Wosi

That is a good point - abstract vs concrete.

Is it possible - or should it be possible - to reconstruct the original code, exactly as it was, from the syntax tree? (Even ignoring comments?)

On 2 September 2015 at 13:28, Christopher Wosinski <[email protected]

wrote:

Take a look at the example made by @RomanYankovsky https://github.com/RomanYankovsky. Comments can be nested everywhere. How should the AST for that code look like in your opinion? Of course you can squeeze the comments into the AST anyway but does the result look good? Is it still easy to gather information from the AST?

Let's to some extreme things.

{comment1} MyObject{comment2}.{comment3}Prop1 //Comment4 .{comment5}Method1( {comment6}6,nil{comment7}, 'hello'{comment8} + 'world' // comment9 ){comment10}.SubProp1 := //comment11 {comment12}'value' + {comment13} + IntToStr(14{comment14});

If you want to represent this valid code in a comment including AST you should change the whole structure. I think if you put comments into the abstract syntax tree you would end up in a concrete syntax tree which is much harder to get information from.

— Reply to this email directly or view it on GitHub https://github.com/RomanYankovsky/DelphiAST/issues/39#issuecomment-137037590 .

Sep 02 '15 11:09 vintagedave

Roman, all I can think of is that you end up with <Comment> nodes all over the place: embedded anywhere. And that might not be ideal.

On 2 September 2015 at 13:24, Roman Yankovsky [email protected] wrote:

@vintagedave https://github.com/vintagedave may be I'm missing an idea, but can you please show me a sample correct syntax tree for code below? I just can't imaging how to do this.

unit commenttest; interface var Int1 {MyFavoriteInt}, Int2: Integer; implementation procedure TestProc;begin Int1 {That's my favorite int } := Int2 * {mul} 2;end; end.```

— Reply to this email directly or view it on GitHub https://github.com/RomanYankovsky/DelphiAST/issues/39#issuecomment-137036792 .

Sep 02 '15 11:09 vintagedave

I would suggest to introduce a new property, similar to attributes: Syntax Tree Node would get property Comments: TNodeList (serialized to XML elements). Comment will be related to preceding or following Node (arguably, maybe two properties CommentsBefore and CommentsAfter would fit better). Your last example this would look like this:

Alexander

From: David Millington [mailto:[email protected]] Sent: Wednesday, September 2, 2015 3:00 PM To: RomanYankovsky/DelphiAST [email protected] Cc: barbalion [email protected] Subject: Re: [DelphiAST] Include Comments in the AST (#39)

Roman, all I can think of is that you end up with <Comment> nodes all over the place: embedded anywhere. And that might not be ideal.

On 2 September 2015 at 13:24, Roman Yankovsky <[email protected]mailto:[email protected]> wrote:

@vintagedave https://github.com/vintagedave may be I'm missing an idea, but can you please show me a sample correct syntax tree for code below? I just can't imaging how to do this.

unit commenttest; interface var Int1 {MyFavoriteInt}, Int2: Integer; implementation procedure TestProc;begin Int1 {That's my favorite int } := Int2 * {mul} 2;end; end.```

— Reply to this email directly or view it on GitHub https://github.com/RomanYankovsky/DelphiAST/issues/39#issuecomment-137036792 .

— Reply to this email directly or view it on GitHubhttps://github.com/RomanYankovsky/DelphiAST/issues/39#issuecomment-137047405.

Sep 02 '15 13:09 barbalion

@barbalion How do you decide that a comment is after or before a syntax node? And will there be stand alone comments?

What's about this code?

unit Basics;

interface
  function IntToStr(Value: integer): string;
  function StrToInt(const Str: string): integer;

implementation
// Converts an integer to a string
function IntToStr(Value: integer): string;
begin
  // ...
end;

// Converts a string to an integer
function StrToInt(const Str: string): integer;
begin
  //....
end;

end.

How would the method headers be represented in AST? Would they be stand alone comments? Would they be part of the function nodes in commentsbefore nodes? Or would the first header be a stand alone comment (as a child of implementation) while the second one appears in the commentsafter section of the first function node? And how would you figure out what to do? Linking comments to their context seems fuzzy to me. Sometimes comments appear before the commented code, sometimes after. And sometimes its really weird like this one:

type TMyObject = class // this class should only
private                // be used by the basic 
  FName: string;       // code libraries like
  FSize: integer;      // Lib1, Lib2 and LibOld
end;

Which of these comments is part of the typedeclaration node? Which is part of the private, field, name or type node? After answering these questions - Are we happy with the resulting AST? Would it be easy for you to get the comments out of the AST and do something with them? Or would it be easier to have a list of comments including their source position and maybe having references to the syntax nodes before and behind the comment?

Sep 02 '15 16:09 Wosi

I haven't had time in the past months and won't have for at least one or two months. I am not yet satisfied with the implementation. My last problem where incorrect position information for different statements and thatswhy the Clang alike attempt to attach comments to nodes did fail.

Sep 02 '15 18:09 uschuster

@barbalion How do you decide that a comment is after or before a syntax node? This is a tricky thing. But there are two options:

  Make Before and After the same (After for previous Node has the same as Before for following one).

  Add some heuristics to guess the right one.

And will there be stand alone comments? I would answer no.

What's about this code? I would say that this would look like this:
… Note some duplications. You can avoid them if apply some heuristics (like ‘if there empty line before’), but you can leave it this way.
How would the method headers be represented in AST? Would they be stand alone comments? Would they be part of the function nodes in commentsbefore nodes? Or would the first header be a stand alone comment (as a child of implementation) while the second one appears in the commentsafter section of the first function node? And how would you figure out what to do? You didn’t get my idea. I’m proposing to make a separate property of Node. This property will represent comments, but the comment themselves will not create a Node. To give you an example: look at METHOD node. You can see that name doesn’t create a node – it’s an attribute of METHOD node. The idea with comment is to make them similar to these attributes. The reason I put CommentBefore and CommentAfter into XML Element is just because there could be several comments at one node. And there is no way to put multiple values into a single XML attribute.

Arguably you can put all comments into single XML attribute without separating them. For example:

// Converts an integer ...

// ... to a string

function IntToStr(Value: integer): string; METHOD node here has two comments. But for practical use we can consider them as one big multiline comment and put it into XML attribute: <METHOD begin_line="6" begin_col="1" end_line="8" end_col="1" kind="function" name="StrToInt" commentsbefore="// Converts an integer ...\n// ... to a string" commentsafter=""> (XML attribute supports multiline values.) But in this case we lose information about comment’s start and end (and that’s bad). So in other words, my suggestion is really to keep comments out of the AST, but at the same time link them to the nodes (like attributes).

Linking comments to their context seems fuzzy to me. Sometimes comments appear before the commented code, sometimes after. And sometimes its really weird like this one: type TMyObject = class // this class should only private // be used by the basic FName: string; // code libraries like FSize: integer; // Lib1, Lib2 and LibOld end; Which of these comments is part of the typedeclaration node? Which is part of the private, field, name or type node? After answering these questions - Are we happy with the resulting AST? Would it be easy for you to get the comments out of the AST and do something with them? Or would it be easier to have a list of comments including their source position and maybe having references to the syntax nodes before and behind the comment? As I said – there will be no AST Nodes for comments – but only additional property of existing Nodes.

Alexander

From: Christopher Wosinski [mailto:[email protected]] Sent: Wednesday, September 2, 2015 7:20 PM To: RomanYankovsky/DelphiAST [email protected] Cc: barbalion [email protected] Subject: Re: [DelphiAST] Include Comments in the AST (#39)

@barbalionhttps://github.com/barbalion How do you decide that a comment is after or before a syntax node? And will there be stand alone comments?

What's about this code?

unit Basics;

interface

function IntToStr(Value: integer): string;

function StrToInt(const Str: string): integer;

implementation

// Converts an integer to a string

function IntToStr(Value: integer): string;

begin

// ...

end;

// Converts a string to an integer

function StrToInt(const Str: string): integer;

begin

//....

end;

end.

How would the method headers be represented in AST? Would they be stand alone comments? Would they be part of the function nodes in commentsbefore nodes? Or would the first header be a stand alone comment (as a child of implementation) while the second one appears in the commentsafter section of the first function node? And how would you figure out what to do? Linking comments to their context seems fuzzy to me. Sometimes comments appear before the commented code, sometimes after. And sometimes its really weird like this one:

type TMyObject = class // this class should only

private // be used by the basic

FName: string; // code libraries like

FSize: integer; // Lib1, Lib2 and LibOld

end;

Which of these comments is part of the typedeclaration node? Which is part of the private, field, name or type node? After answering these questions - Are we happy with the resulting AST? Would it be easy for you to get the comments out of the AST and do something with them? Or would it be easier to have a list of comments including their source position and maybe having references to the syntax nodes before and behind the comment?

— Reply to this email directly or view it on GitHubhttps://github.com/RomanYankovsky/DelphiAST/issues/39#issuecomment-137151844.

Sep 03 '15 18:09 barbalion

I did add TPasSyntaxTreeBuilder.Comments property. It stores all comments in a separate list. Please give it a try. See 25eb2ac8cb65a08b3719943c352c69481aa58bb6

Sep 17 '15 20:09 RomanYankovsky

DelphiAST DelphiAST copied to clipboard

Include Comments in the AST

DelphiAST
DelphiAST copied to clipboard