CoffeeScriptRedux
CoffeeScriptRedux copied to clipboard
Represent comments in the AST
I'm looking forward to create a successor of Codo that uses CoffeeScriptRedux. I saw that I could get a JSON-serialised AST representation of the input with the --parse option, which would be the perfect base for further processing, but I noticed that comments aren't represented in the AST. The given CoffeeScript file
# Comment
class Test
is represented with
{ type: 'CS.Program',
block:
{ type: 'CS.Class',
nameAssignee: { type: 'CS.Identifier', data: 'Test' },
parent: undefined,
ctor: undefined,
block: undefined,
name: { type: 'CS.Identifier', data: 'Test' },
boundMembers: [] } }
This also applies to the original CoffeeScript and the current Codo implementation pre-processes the source code to convert comments to block comments. This works well for most situations, but it's still an ugly hack.
Is this by intention for some reason? It would be great to have comments in the AST to enable robust tooling support.
Thanks a lot for your work.
Representing comments in the AST is not a priority, but it is still a feature that would be nice to have. I may implement this at some point in the future, but it is much more likely to get in if someone else implements it and sends a pull request.
Perfect. I try to get this done and will open a pull request when it's finished or I need help.
A strategy I would start with: change the TERM rule to allow an optional single-line comment before the terminator, then have that rule side-effect, adding the comment to a global list of comments. On every node creation, call some function like addComments that adds comment markers to the nodes if the global list contains any comments, then clears the list and returns the node.
edit: Also, I'm currently omitting comments in the preprocessed code. You'll have to just change that to emit them instead.
Thanks for the hints, that will help me for getting started.
Hello @michaelficarra, what about the block comments like
###
A code comment
###
As I can see, with the current master they are totally omitted from the generated js output AND from the AST. Is it desired behavior yet? Otherwise I could try picking this issue.
Thanks!
PS that's what I am exactly talking about:
[lexaux@archbang test]$ coffee --version
CoffeeScript version 2.0.0-beta5
[lexaux@archbang test]$ cat comments.coffee
###*
*A comment here?
###
[lexaux@archbang test]$ coffee --parse -i comments.coffee
{ type: 'Program', body: undefined }
[lexaux@archbang test]$ coffee --js -i comments.coffee
[lexaux@archbang test]$
Feel free.
Cool, tried the approach from https://github.com/michaelficarra/CoffeeScriptRedux/issues/33#issuecomment-8008806 comment, though I augmented TERMINATOR, not the TERM rule, as I need to support rather multliners.
The approach itself works, but is troublesome. So, what we do:
- In the TERMINATOR rule, collect comment text. Push it to an array stored in initializer.
- I added to
idfunction, so that it a) gets called even if raw/position is enabled, and it also tries to pop from the comment storage and append comment to a node.
Well, that's it.
Trouble is, however, that the next node created after the comment is found != the correct node. As the traversal goes in deep, it would trigger comment to the very first return (e.g. when we found something). To explain a little:
The sample program
func: (a, b) ->
console.log a+b
###
KOMMENT
###
func1: (c) ->
###
also comment
###
if c > 0 then console.log c
console.log c
AST produced using the approach from above:
{ body:
{ statements:
[ { members:
[ { key: { data: 'func' },
expression:
{ parameters: [ { data: 'a' }, { data: 'b' } ],
body:
{ statements:
[ { function:
{ expression: { data: 'console' },
memberName: 'log',
raw: NaN },
arguments:
[ { left: { data: 'a' },
right: { data: 'b' } } ] } ] } } },
{ key: { data: 'func1', comment: 'KOMMENT\n' },
expression:
{ parameters: [ { data: 'c' } ],
body:
{ statements:
[ { condition:
{ left: { data: 'c', comment: '\talso comment\n\t' },
right: { data: 0 } },
consequent:
{ function:
{ expression: { data: 'console' },
memberName: 'log',
raw: NaN },
arguments: [ { data: 'c' } ] },
alternate: null },
{ function:
{ expression: { data: 'console' },
memberName: 'log',
raw: NaN },
arguments: [ { data: 'c' } ] } ] } } } ] } ] } }
'comment' field is added not to the IF but to the first concrete finding by parser - to the identifier c. Normally, comment should be attached to condition I think.
I am now thinking of other approaches; maybe changing the grammar to support block comments as separate entities (not part of the separator) and then writing them down during the AST materialization (this way it may be suppressed later, but requires tree traversal at compile-time).
What do you think?
PS sorry if missing something really obvious, haven't written parsers since the school.
Trouble is, however, that the next node created after the comment is found != the correct node.
Oh, of course. I forgot about that. Sorry about the bad recommendation.
I am now thinking of other approaches; maybe changing the grammar to support block comments as separate entities
That's always been a possibility, but the unfortunate consequence is that we'd have an optional comment in front of EVERY rule. Still, it could be the cleanest approach if we can't find anything better.
Right now, I don't have any better ideas. I'll continue to think about it. Thanks for taking a look at this.
Thanks Michael, I'll think about that too.
For now we have the TERMINATOR (which may obviously contain comments) after the rule, not before. If we change this (prepend TERMINATOR), we could then output comment text from the TERMINATOR match and somehow apply that to the node being created (another function wrapping new node?)