DelphiAST
DelphiAST copied to clipboard
Include Comments in the AST
Related to #38, it would be useful to include code comments within the Snytax Tree.
Comments are ignored by lexer, parser doesn't have this information. But I think it is possible to implement. I'll take a look.
I've made the first changes for this issue in my fork, but note the TODO in the log message. https://github.com/uschuster/DelphiAST/commit/e6e7c2220ed9b62cc208f45896c4fde22e78f9d6
@uschuster unfortunately, that's much more complicated...
Try to parse this code and take a look at the syntax tree:
unit commenttest;
interface
var
Int1 {MyFavoriteInt}, Int2: Integer;
implementation
procedure TestProc;
begin
Int1 {That's my favorite int } := Int2 * {mul} 2;
end;
end.
But it is a good start. Great!
P.S. To save your time. This is the syntax tree for code above.
<?xml version="1.0"?>
<UNIT line="1" col="1" name="commenttest">
<INTERFACE line="3" col="1">
<VARIABLES line="5" col="1">
<VARIABLE>
<NAME line="6" col="3" value="Int1"/>
<TYPE line="6" col="31" name="Integer"/>
</VARIABLE>
<VARIABLE>
<NAME line="6" col="25" value="Int2"/>
<TYPE line="6" col="31" name="Integer"/>
</VARIABLE>
</VARIABLES>
</INTERFACE>
<IMPLEMENTATION line="8" col="1">
<METHOD line="10" col="1" name="TestProc" kind="procedure">
<STATEMENTS end_line="13" begin_line="12" end_col="1" begin_col="3">
<ASSIGN line="12" col="3">
<LHS>
<COMMENT end_line="12" type="Borland" begin_line="12" end_col="32" value="{That's my favorite int }" begin_col="8"/>
</LHS>
<RHS>
<EXPRESSION line="12" col="37">
<MUL line="12" col="42">
<COMMENT end_line="12" type="Borland" begin_line="12" end_col="48" value="{mul}" begin_col="44"/>
<LITERAL line="12" col="50" type="numeric" value="2"/>
</MUL>
</EXPRESSION>
</RHS>
</ASSIGN>
</STATEMENTS>
</METHOD>
</IMPLEMENTATION>
</UNIT>
@RomanYankovsky Ah I see. I just tried to add comments into a separate child list of TSyntaxNode, but for some constructs the comments got lost. I think using a separate list for all comments and try to attach the comments at the end could be the way to go.
Without comments the AST is incomplete and useless for my purposes :( So the case is important. Please consider to fix it.
It'd be nice to access the comments for different reasons. But comments are not part of an AST. They live next to it. So there should be a separate list like @uschuster has already mentioned.
Why should it be a separate list? What's wrong with having nodes that represent comments inside the syntax tree? A comment is syntactically valid - its syntax is such that its contents are ignored by the compiler. But it's still valid code.
On 2 September 2015 at 13:04, Christopher Wosinski <[email protected]
wrote:
It'd be nice to access the comments for different reasons. But comments are not part of an AST. They live next to it. So there should be a separate list like @uschuster https://github.com/uschuster has already mentioned.
— Reply to this email directly or view it on GitHub https://github.com/RomanYankovsky/DelphiAST/issues/39#issuecomment-137031242 .
@vintagedave may be I'm missing an idea, but can you please show me a sample correct syntax tree for code below? I just can't imaging how to do this.
unit commenttest;
interface
var
Int1 {MyFavoriteInt}, Int2: Integer;
implementation
procedure TestProc;
begin
Int1 {That's my favorite int } := Int2 * {mul} 2;
end;
end.
@uschuster worked on that, but have never done a pull request. Did he finished his effort?
Take a look at the example made by @RomanYankovsky. Comments can be nested everywhere. How should the AST for that code look like in your opinion? Of course you can squeeze the comments into the AST anyway but does the result look good? Is it still easy to gather information from the AST?
Let's do some extreme things.
{comment1} MyObject{comment2}.{comment3}Prop1 //Comment4
.{comment5}Method1(
{comment6}6,nil{comment7}, 'hello'{comment8} + 'world' // comment9
){comment10}.SubProp1 := //comment11
{comment12}'value' + {comment13} + IntToStr(14{comment14});
If you want to represent this valid code in a comment including AST you should change the whole structure. I think if you put comments into the abstract syntax tree you would end up in a concrete syntax tree which is much harder to get information from.
That is a good point - abstract vs concrete.
Is it possible - or should it be possible - to reconstruct the original code, exactly as it was, from the syntax tree? (Even ignoring comments?)
On 2 September 2015 at 13:28, Christopher Wosinski <[email protected]
wrote:
Take a look at the example made by @RomanYankovsky https://github.com/RomanYankovsky. Comments can be nested everywhere. How should the AST for that code look like in your opinion? Of course you can squeeze the comments into the AST anyway but does the result look good? Is it still easy to gather information from the AST?
Let's to some extreme things.
{comment1} MyObject{comment2}.{comment3}Prop1 //Comment4 .{comment5}Method1( {comment6}6,nil{comment7}, 'hello'{comment8} + 'world' // comment9 ){comment10}.SubProp1 := //comment11 {comment12}'value' + {comment13} + IntToStr(14{comment14});
If you want to represent this valid code in a comment including AST you should change the whole structure. I think if you put comments into the abstract syntax tree you would end up in a concrete syntax tree which is much harder to get information from.
— Reply to this email directly or view it on GitHub https://github.com/RomanYankovsky/DelphiAST/issues/39#issuecomment-137037590 .
Roman, all I can think of is that you end up with <Comment> nodes all over the place: embedded anywhere. And that might not be ideal.
On 2 September 2015 at 13:24, Roman Yankovsky [email protected] wrote:
@vintagedave https://github.com/vintagedave may be I'm missing an idea, but can you please show me a sample correct syntax tree for code below? I just can't imaging how to do this.
unit commenttest; interface var Int1 {MyFavoriteInt}, Int2: Integer; implementation procedure TestProc;begin Int1 {That's my favorite int } := Int2 * {mul} 2;end; end.```
— Reply to this email directly or view it on GitHub https://github.com/RomanYankovsky/DelphiAST/issues/39#issuecomment-137036792 .
I would suggest to introduce a new property, similar to attributes: Syntax Tree Node would get property Comments: TNodeList (serialized to XML elements). Comment will be related to preceding or following Node (arguably, maybe two properties CommentsBefore and CommentsAfter would fit better). Your last example this would look like this:
Alexander
From: David Millington [mailto:[email protected]] Sent: Wednesday, September 2, 2015 3:00 PM To: RomanYankovsky/DelphiAST [email protected] Cc: barbalion [email protected] Subject: Re: [DelphiAST] Include Comments in the AST (#39)
Roman, all I can think of is that you end up with <Comment> nodes all over the place: embedded anywhere. And that might not be ideal.
On 2 September 2015 at 13:24, Roman Yankovsky <[email protected]mailto:[email protected]> wrote:
@vintagedave https://github.com/vintagedave may be I'm missing an idea, but can you please show me a sample correct syntax tree for code below? I just can't imaging how to do this.
unit commenttest; interface var Int1 {MyFavoriteInt}, Int2: Integer; implementation procedure TestProc;begin Int1 {That's my favorite int } := Int2 * {mul} 2;end; end.```
— Reply to this email directly or view it on GitHub https://github.com/RomanYankovsky/DelphiAST/issues/39#issuecomment-137036792 .
— Reply to this email directly or view it on GitHubhttps://github.com/RomanYankovsky/DelphiAST/issues/39#issuecomment-137047405.
@barbalion How do you decide that a comment is after or before a syntax node? And will there be stand alone comments?
What's about this code?
unit Basics;
interface
function IntToStr(Value: integer): string;
function StrToInt(const Str: string): integer;
implementation
// Converts an integer to a string
function IntToStr(Value: integer): string;
begin
// ...
end;
// Converts a string to an integer
function StrToInt(const Str: string): integer;
begin
//....
end;
end.
How would the method headers be represented in AST? Would they be stand alone comments? Would they be part of the function nodes in commentsbefore
nodes? Or would the first header be a stand alone comment (as a child of implementation
) while the second one appears in the commentsafter
section of the first function node? And how would you figure out what to do?
Linking comments to their context seems fuzzy to me. Sometimes comments appear before the commented code, sometimes after. And sometimes its really weird like this one:
type TMyObject = class // this class should only
private // be used by the basic
FName: string; // code libraries like
FSize: integer; // Lib1, Lib2 and LibOld
end;
Which of these comments is part of the typedeclaration
node? Which is part of the private
, field
, name
or type
node?
After answering these questions - Are we happy with the resulting AST? Would it be easy for you to get the comments out of the AST and do something with them? Or would it be easier to have a list of comments including their source position and maybe having references to the syntax nodes before and behind the comment?
I haven't had time in the past months and won't have for at least one or two months. I am not yet satisfied with the implementation. My last problem where incorrect position information for different statements and thatswhy the Clang alike attempt to attach comments to nodes did fail.
@barbalion How do you decide that a comment is after or before a syntax node? This is a tricky thing. But there are two options:
-
Make Before and After the same (After for previous Node has the same as Before for following one).
-
Add some heuristics to guess the right one.
And will there be stand alone comments? I would answer no.
What's about this code? I would say that this would look like this:
Note some duplications. You can avoid them if apply some heuristics (like ‘if there empty line before’), but you can leave it this way. … How would the method headers be represented in AST? Would they be stand alone comments? Would they be part of the function nodes in commentsbefore nodes? Or would the first header be a stand alone comment (as a child of implementation) while the second one appears in the commentsafter section of the first function node? And how would you figure out what to do? You didn’t get my idea. I’m proposing to make a separate property of Node. This property will represent comments, but the comment themselves will not create a Node. To give you an example: look at METHOD node. You can see that name doesn’t create a node – it’s an attribute of METHOD node. The idea with comment is to make them similar to these attributes. The reason I put CommentBefore and CommentAfter into XML Element is just because there could be several comments at one node. And there is no way to put multiple values into a single XML attribute.
Arguably you can put all comments into single XML attribute without separating them. For example:
// Converts an integer ...
// ... to a string
function IntToStr(Value: integer): string; METHOD node here has two comments. But for practical use we can consider them as one big multiline comment and put it into XML attribute: <METHOD begin_line="6" begin_col="1" end_line="8" end_col="1" kind="function" name="StrToInt" commentsbefore="// Converts an integer ...\n// ... to a string" commentsafter=""> (XML attribute supports multiline values.) But in this case we lose information about comment’s start and end (and that’s bad). So in other words, my suggestion is really to keep comments out of the AST, but at the same time link them to the nodes (like attributes).
Linking comments to their context seems fuzzy to me. Sometimes comments appear before the commented code, sometimes after. And sometimes its really weird like this one: type TMyObject = class // this class should only private // be used by the basic FName: string; // code libraries like FSize: integer; // Lib1, Lib2 and LibOld end; Which of these comments is part of the typedeclaration node? Which is part of the private, field, name or type node? After answering these questions - Are we happy with the resulting AST? Would it be easy for you to get the comments out of the AST and do something with them? Or would it be easier to have a list of comments including their source position and maybe having references to the syntax nodes before and behind the comment? As I said – there will be no AST Nodes for comments – but only additional property of existing Nodes.
Alexander
From: Christopher Wosinski [mailto:[email protected]] Sent: Wednesday, September 2, 2015 7:20 PM To: RomanYankovsky/DelphiAST [email protected] Cc: barbalion [email protected] Subject: Re: [DelphiAST] Include Comments in the AST (#39)
@barbalionhttps://github.com/barbalion How do you decide that a comment is after or before a syntax node? And will there be stand alone comments?
What's about this code?
unit Basics;
interface
function IntToStr(Value: integer): string;
function StrToInt(const Str: string): integer;
implementation
// Converts an integer to a string
function IntToStr(Value: integer): string;
begin
// ...
end;
// Converts a string to an integer
function StrToInt(const Str: string): integer;
begin
//....
end;
end.
How would the method headers be represented in AST? Would they be stand alone comments? Would they be part of the function nodes in commentsbefore nodes? Or would the first header be a stand alone comment (as a child of implementation) while the second one appears in the commentsafter section of the first function node? And how would you figure out what to do? Linking comments to their context seems fuzzy to me. Sometimes comments appear before the commented code, sometimes after. And sometimes its really weird like this one:
type TMyObject = class // this class should only
private // be used by the basic
FName: string; // code libraries like
FSize: integer; // Lib1, Lib2 and LibOld
end;
Which of these comments is part of the typedeclaration node? Which is part of the private, field, name or type node? After answering these questions - Are we happy with the resulting AST? Would it be easy for you to get the comments out of the AST and do something with them? Or would it be easier to have a list of comments including their source position and maybe having references to the syntax nodes before and behind the comment?
— Reply to this email directly or view it on GitHubhttps://github.com/RomanYankovsky/DelphiAST/issues/39#issuecomment-137151844.
I did add TPasSyntaxTreeBuilder.Comments property. It stores all comments in a separate list. Please give it a try. See 25eb2ac8cb65a08b3719943c352c69481aa58bb6