parser icon indicating copy to clipboard operation
parser copied to clipboard

Create <whitespace> and <newline>, (, ), etc nodes rather than throwing out tokens

Open bcoe opened this issue 4 years ago • 6 comments

So that there's no lost information in the parse tree, we should create <whitespace> and <newline> tokens for spacing characters that fall outside of <text> nodes.

bcoe avatar Dec 24 '20 18:12 bcoe

I realized after submitting that review that the pr was merged, so just to keep the context organized I figured I would re-post it here to see if we need to follow up:


I might be wrong here, but I think we need a <blankline> token which is <blankline> ::= <newline>, <newline>+ as the separators between the <summary>, <body>, & <footer>. If I read this correctly as it, you would have a valid summary/body as (only one LF):

fix: summary line
this is my body

Am I missing something here? The implementation looks right, but I think it is just miss-represented in the grammar spec.

https://github.com/conventional-commits/parser/pull/26#pullrequestreview-559300866

wesleytodd avatar Dec 28 '20 21:12 wesleytodd

Am I missing something here? The implementation looks right, but I think it is just miss-represented in the grammar spec.

@wesleytodd this was intentional on my part, I see no reason that:

fix: summary line
this is my body

shouldn't parse correctly, since it doesn't cause any trouble for the grammar.

bcoe avatar Dec 29 '20 18:12 bcoe

@byCedric @wesleytodd having been playing around a bit with a tree visitor, it jumps out at me that we'll probably also want to add (, ), and any other tokens I might be missing? to the tree.

This allows us to do something like visit(summary, () => true, (node) => { if (node.value) { text += node.value} }), to put a portion of the tree back together into raw text.

bcoe avatar Dec 29 '20 18:12 bcoe

this was intentional on my part

Not opposed to this, but it is a change (might be breaking even?) from the current spec. What does the current parser do in this case?

This allows us to do something like visit(summary, () => true, (node) => { if (node.value) { text += node.value} }), to put a portion of the tree back together into raw text.

This was my main concern with not having these. If you want to modify and stringify, these markers are very helpful to maintain exact formatting.

wesleytodd avatar Dec 29 '20 21:12 wesleytodd

@byCedric @wesleytodd having been playing around a bit with a tree visitor, it jumps out at me that we'll probably also want to add (, ), and any other tokens I might be missing? to the tree.

Yeah, it feels a bit odd that we remove those scopes. In this change I had to mitigate it by doing some stuff differently.

One thing we could add is something similar to nlcst's punctuation. Not sure if we have more than the scope parenthesis, but if we do we might want to consider that.

byCedric avatar Dec 29 '20 22:12 byCedric

Not opposed to this, but it is a change (might be breaking even?) from the current spec.

I think I've made a few breaking changes as we transcribe the grammar. I'm trying to channel all the things I've seen colleagues accidentally get wrong, and then reach out to me about.

Making the grammar as forgiving as possible, but then having a linter on top of it, I'd say is the ideal. We can have a linter indicate that it's recommended to have \n\n between the summary and body, while still parsing appropriately.

Yeah, it feels a bit odd that we remove those scopes. In this change I had to mitigate it by doing some stuff differently.

I'm happy with a Punctuation node, if it's only parenthesis we don't have in the tree right now, perhaps we could even just call it a Parenthesis node?

bcoe avatar Dec 30 '20 00:12 bcoe