kdl
kdl copied to clipboard
Spec is unclear about how slashdash comments work
Consider
node/-"val""val"
node /-{} "val"
node /- /-{} {}
kdl-rs accepts this, but the formal grammar forbids all three nodes here.
EDIT: preferred resolution is now #285.
Original suggestion
My preferred resolution at the moment[^1] is to make the implementation and prose match the formal grammar here, with one small change: require whitespace before a slashdashed children block as well. (The lack of such causes implementation issues[^2].)
[^1]: And what I'm implementing for the time being in my annoyingly-zero-copy parse/visitor. Don't let me strongarm you, though; I've figured out a reasonable (though kind of annoying) way to implement the reference implementation's behavior, and that's the most interesting option to implement in my LL(2) recursive descent parse/visitor. In fact, most of the implementation annoyance is still there anyway in order to support a non-spaced non-slashed children block while requiring spacing for everything else.
[^2]: Namely: the grammar is no longer LL(n), as it requires unbounded token lookahead when seeing a slashdash (past arbitrarily many escaped lines) to determine if it's a slashdashed argument/property (error) or a slashdashed children block (allowed). This arbitrary lookahead isn't just an implementation concern, either; it means that the location of the produced error is potentially far from the cause of the error.
Suggested wording:
In Argument/Property, delete the references to /-
. The Children Block's prose does not currently allow for it to be slashdashed, despite this being allowed in the formal grammar and reference implementation. In Node:
Nodes MAY be prefixed with
/-
to "comment out" the entire node, including its properties, arguments, and children, making it act as whitespace, even if it spreads across multiple lines. Additionally, the node's properties, arguments, and children block MAY themselves be prefixed with/-
to "comment out" that entire component. If the children block is "commented out" in this fashion, it MUST be separated by whitespace or a slash-escaped line continuation. Even if a component is "commented out" in this fashion, it still acts as the commented out component syntactically. (For example, "commented out" arguments and properties must still be separated by plain whitespace, and if a "commented out" children is present, it must be after all arguments and properties, and there may not be another children block, whether "commented out" or not.)
In the formal grammar:
node := ('/-' node-space*)? type? identifier node-prop-or-arg* node-children? node-space* node-terminator node-prop-or-arg := node-space+ ('/-' node-space*)? (prop | value) node-children := (node-space+ '/-') node-space* '{' nodes '}'
(Actually, since I wrote this while working on handling this in my implementation, I'm not 100% sure what my implementation is going to end up doing, but just a little more work has suggested that aligning with kdl-rs's current behavior will actually be easier than my desired semantics. I still desire them, though; it'd just be significantly easier if space was required before children blocks as well.)
Hm. It does feel like that second one should be legal, though, doesn't it? I wonder if slashdashed children should be allowed to be interspersed in a node's arguments...
that said, we should be careful/conservative here, because any changes to the spec (especially the grammar) are semver-breaking and would fall into the kdl 2.0 bucket.
If we want to instead make the formal grammar match the prose & implementation, that would be
nodes := (line-space* node)* line-space* line-space := newline | ws | single-line-comment | '/-' node-space* node node-space := ws* escline ws* | ws+ | '/-' node-space* (node-prop-or-arg | node-children) node := type? identifier (node-space+ node-prop-or-arg)* (node-space* node-children)? node-space* node-terminator node-prop-or-arg := prop | value node-children := '{' nodes '}'
I believe this should be just a fix to the formal grammar to match the intended prose semantics, and after a bit more impl work I don't mind that.
Relaxing the MUST for node prop-or-arg's separating whitespace to a SHOULD should be a minor addition, though; all "KDL 1.0" will still be accepted, it's just "KDL 1.1" is a small superset that's not fully portable (and only for KDL that shouldn't really be being written anyway). Formally, that just changes the node-space+
to node-space*
.
Given that the formal grammar is authoritative here,
- Does this mean that kdl-rs is incorrect, and should be fixed to match the formal grammar?
- This is technically a breaking change to make kdl-rs stricter, though it's a bugfix to match the spec it's implementing.
- Should I match kdl-rs or the formal grammar in my implementation for the time being?
- With the actual "treat it as node-space" implementation, it becomes simpler to match kdl-rs than the formal spec. I think.
Also, new fun example:
node /- /-{} {}
kdl-rs is not a reference implementation. The grammar in the spec is the ultimate authority for implementations, along with the corresponding test suite that helps confirm compliance.
That said, tbh, this is kind of a corner case, and I think I'm gonna start getting everyone together to do a KDL 2.0 soon, so I would just implement to that so you're ahead of the game. KDL is low-usage enough that we can have a little wiggle room on the margins.
Another interesting edge case: line continuations don't allow slashdash-escaped arguments/properties, even in the kdl-rs implementation. This is a point against them being "just plain whitespace."
It's been a while for me but I think that might be intentional though, the idea being that without the /-
the syntax should still be valid (which wouldn't be the case of a slashdash-escaped argument occurred after a line continuation \
).
Edit: Oh I see which part of the spec you mean now. I guess in terms of syntax & parsing they're not plain whitespace, but in the resulting document model they are?
without the
/-
the syntax should still be valid
If this is still the intent, then #285 (which makes the spec match kdl-rs's implementation in treating /-
as node-space (but not ws)) is the wrong resolution to this issue, and kdl-rs should be fixed to match the formal grammar.
I'd have to check, I might have missed or misremembered something.
The fixes for this have been merged into the kdl-v2 branch