kdl icon indicating copy to clipboard operation
kdl copied to clipboard

Spec is unclear about how slashdash comments work

Open CAD97 opened this issue 2 years ago • 10 comments

Consider

node/-"val""val"
node /-{} "val"
node /- /-{} {}

kdl-rs accepts this, but the formal grammar forbids all three nodes here.

EDIT: preferred resolution is now #285.

Original suggestion

My preferred resolution at the moment[^1] is to make the implementation and prose match the formal grammar here, with one small change: require whitespace before a slashdashed children block as well. (The lack of such causes implementation issues[^2].)

[^1]: And what I'm implementing for the time being in my annoyingly-zero-copy parse/visitor. Don't let me strongarm you, though; I've figured out a reasonable (though kind of annoying) way to implement the reference implementation's behavior, and that's the most interesting option to implement in my LL(2) recursive descent parse/visitor. In fact, most of the implementation annoyance is still there anyway in order to support a non-spaced non-slashed children block while requiring spacing for everything else.

[^2]: Namely: the grammar is no longer LL(n), as it requires unbounded token lookahead when seeing a slashdash (past arbitrarily many escaped lines) to determine if it's a slashdashed argument/property (error) or a slashdashed children block (allowed). This arbitrary lookahead isn't just an implementation concern, either; it means that the location of the produced error is potentially far from the cause of the error.

Suggested wording:

In Argument/Property, delete the references to /-. The Children Block's prose does not currently allow for it to be slashdashed, despite this being allowed in the formal grammar and reference implementation. In Node:

Nodes MAY be prefixed with /- to "comment out" the entire node, including its properties, arguments, and children, making it act as whitespace, even if it spreads across multiple lines. Additionally, the node's properties, arguments, and children block MAY themselves be prefixed with /- to "comment out" that entire component. If the children block is "commented out" in this fashion, it MUST be separated by whitespace or a slash-escaped line continuation. Even if a component is "commented out" in this fashion, it still acts as the commented out component syntactically. (For example, "commented out" arguments and properties must still be separated by plain whitespace, and if a "commented out" children is present, it must be after all arguments and properties, and there may not be another children block, whether "commented out" or not.)

In the formal grammar:

node := ('/-' node-space*)? type? identifier node-prop-or-arg* node-children? node-space* node-terminator
node-prop-or-arg := node-space+ ('/-' node-space*)? (prop | value)
node-children := (node-space+ '/-') node-space* '{' nodes '}'

CAD97 avatar Aug 21 '22 05:08 CAD97

(Actually, since I wrote this while working on handling this in my implementation, I'm not 100% sure what my implementation is going to end up doing, but just a little more work has suggested that aligning with kdl-rs's current behavior will actually be easier than my desired semantics. I still desire them, though; it'd just be significantly easier if space was required before children blocks as well.)

CAD97 avatar Aug 21 '22 06:08 CAD97

Hm. It does feel like that second one should be legal, though, doesn't it? I wonder if slashdashed children should be allowed to be interspersed in a node's arguments...

zkat avatar Aug 21 '22 15:08 zkat

that said, we should be careful/conservative here, because any changes to the spec (especially the grammar) are semver-breaking and would fall into the kdl 2.0 bucket.

zkat avatar Aug 21 '22 15:08 zkat

If we want to instead make the formal grammar match the prose & implementation, that would be

nodes := (line-space* node)* line-space*

line-space := newline | ws | single-line-comment | '/-' node-space* node
node-space := ws* escline ws* | ws+ | '/-' node-space* (node-prop-or-arg | node-children)

node := type? identifier (node-space+ node-prop-or-arg)* (node-space* node-children)? node-space* node-terminator
node-prop-or-arg := prop | value
node-children := '{' nodes '}'

I believe this should be just a fix to the formal grammar to match the intended prose semantics, and after a bit more impl work I don't mind that.

Relaxing the MUST for node prop-or-arg's separating whitespace to a SHOULD should be a minor addition, though; all "KDL 1.0" will still be accepted, it's just "KDL 1.1" is a small superset that's not fully portable (and only for KDL that shouldn't really be being written anyway). Formally, that just changes the node-space+ to node-space*.

CAD97 avatar Aug 21 '22 18:08 CAD97

Given that the formal grammar is authoritative here,

  • Does this mean that kdl-rs is incorrect, and should be fixed to match the formal grammar?
    • This is technically a breaking change to make kdl-rs stricter, though it's a bugfix to match the spec it's implementing.
  • Should I match kdl-rs or the formal grammar in my implementation for the time being?
    • With the actual "treat it as node-space" implementation, it becomes simpler to match kdl-rs than the formal spec. I think.

Also, new fun example:

node /- /-{} {}

CAD97 avatar Aug 21 '22 18:08 CAD97

kdl-rs is not a reference implementation. The grammar in the spec is the ultimate authority for implementations, along with the corresponding test suite that helps confirm compliance.

That said, tbh, this is kind of a corner case, and I think I'm gonna start getting everyone together to do a KDL 2.0 soon, so I would just implement to that so you're ahead of the game. KDL is low-usage enough that we can have a little wiggle room on the margins.

zkat avatar Aug 21 '22 21:08 zkat

Another interesting edge case: line continuations don't allow slashdash-escaped arguments/properties, even in the kdl-rs implementation. This is a point against them being "just plain whitespace."

CAD97 avatar Aug 24 '22 23:08 CAD97

It's been a while for me but I think that might be intentional though, the idea being that without the /- the syntax should still be valid (which wouldn't be the case of a slashdash-escaped argument occurred after a line continuation \).

Edit: Oh I see which part of the spec you mean now. I guess in terms of syntax & parsing they're not plain whitespace, but in the resulting document model they are?

larsgw avatar Aug 24 '22 23:08 larsgw

without the /- the syntax should still be valid

If this is still the intent, then #285 (which makes the spec match kdl-rs's implementation in treating /- as node-space (but not ws)) is the wrong resolution to this issue, and kdl-rs should be fixed to match the formal grammar.

CAD97 avatar Aug 25 '22 00:08 CAD97

I'd have to check, I might have missed or misremembered something.

larsgw avatar Aug 25 '22 00:08 larsgw

The fixes for this have been merged into the kdl-v2 branch

zkat avatar Dec 13 '23 05:12 zkat