JuliaParser.jl
JuliaParser.jl copied to clipboard
Comments
I'm not even sure that this would make sense to do, but would it be possible to optionally keep comments in a parsed expression? I'm thinking a comment could be stored in a new type, for example a CommentNode
, sort of like a LineNumberNode
. Since an Expr
with a CommentNode
in it's :args
field would not really be a valid expression, I suppose it would also require an ExprWithComments
type or something.
The use case I'm thinking about is an automatic code formatting tool so it would be necessary to keep comments along with the code.
@lendle this would make a lot of sense, and I have thought about implementing this. It wouldn't be really hard to do either, just have an extra flag in the parse state context object and accumulate the comments into a buffer instead of skipping over them.
For now we could add comment nodes even though they are a not valid AST objects. If you wanted to eval the resulting AST you could just strip out the comment nodes before handing it off to Julia. I do the same thing to normalize the AST in the tests if you want to take a look (Julia exposes a bit of c-shim internals as part of its AST, and Flisp does not know about int128's or bigints).
An automatic code formatting tool would be amazing.
@JeffBezanson What would be required to allow CommentNode
in Julia expressions?
Well an easy hack if you didn't want to support them in the Base parser would be to just ignore them in the ast.c
shim.
+1. Something like this would help Lint produce more accurate message location. https://github.com/tonyhffong/Lint.jl/issues/54
I recently looked at the C# Roslyn design for this kind of problem. The background here is that MS rewrote their C# compiler into a modular piece of code that could be used both for compiling, but also the full IDE C# support (which is impressive, if you have ever tried VS).
The basic philosophy of the project was that they have a parser that creates a syntax tree (very much like expression trees in Julia), but that this syntax tree has to have ALL the info from the original text file so that you can round-trip a full fidelity text file again. So the syntax tree has every single whitespace, comment, everything represented.
The design they use is that they have additional syntax trivia data, but this data is attached to either the next or previous "real" syntax expression as a child, and not a new expression type that can show up at the same level as a real language construct. The benefit of this is that when you traverse your expression tree, you can VERY easily ignore the trivia syntax by just not looking at these fields of an expression object. I think that design overall is actually better than to have a new comment expression type (or the line expression type that exists already).
The design the chose to go back to the location in the source code that corresponds to a syntax element also seems pretty robust, i.e. each element has a span property that gives the precise character position and coverage of that element.
In any case, it might be worth taking a look at their design and to think whether some of these ideas would make sense for Julia. The high level doc describing their approach is here.
Maybe a good way to play around with this would be to add a full new type hierarchy that replicates something like the Roslyn stuff to this package here, and have a new parse
function that produces such a tree. This could be used by tools that want to provide some IDE integration and one could play around with performance for now (and if it turns out to work well, it might even be migrated back as the core Julia expression system? But that question is clearly above my pay grade).
@davidanthoff thank you for your detailed comment, I'll take a look at the resources you posted.
I have written up my proposal in a more concrete way and moved it to its own issue #22.