Merge KDL v2
Here it is! The long-awaited KDL v2, which is where we go ahead and make a handful of technically-breaking changes to address some corner cases we've run into over the past year while KDL has been getting implemented in a bunch of languages by various people.
I'd love to get feedback on what we have slated, and whether there's anything else we should definitely include when this goes out.
/cc @CAD97
I have a slight preference for #241 over #204 personally, though only slight.
I have a preference for #204, because the primary use case I can see for # in bare identifiers is hashtag-like which would be illegal under either, and it seems better to go with the simpler rule.
That preference is not terribly strong, though.
Edit: I misread, I'm fine with either
the primary use case I can see for
#in bare identifiers is hashtag-like
To clarify, #241 allows #ident as a bare ident, and both will of course still allow "r#ident" as a quoted ident.
Argument for allowing: transliterating CSS selectors, for e.g. CSS-in-KDL. Argument against allowing: using the syntax in KQL as a selector like CSS.
Argument for allowing: transliterating CSS selectors, for e.g. CSS-in-KDL. Argument against allowing: using the syntax in KQL as a selector like CSS.
#foo in CSS is special-casing the id attribute. KQL doesn't have an equivalent to HTML's id, and using #foo syntax in KQL to mean something else might be confusing given its meaning in CSS, so I don't find the argument against compelling.
My inclination is to prefer #241 as well, as I think being able to write hashtags is neat. It also allows for doing things like writing Nix flake references as bare words, e.g. nixpkgs#hello.
Can we squeeze https://github.com/kdl-org/kdl/issues/213 into this? The specific proposal is the addition of escaped whitespace in string literals– that \, followed by literal (non-escaped) whitespace, should consume and discard all that whitespace. This is a slight simplification of the Rust rule, which specifically requires that \ be followed by \n.
I'm also a fan of #213, though it seems like there's some ambiguity in the discussion. Namely, does
- "x\
y\
z"
Translate to "xyz", "x y z", or "x\ny\nz"?
Can we squeeze #213 into this? The specific proposal is the addition of escaped whitespace in string literals– that
\, followed by literal (non-escaped) whitespace, should consume and discard all that whitespace. This is a slight simplification of the Rust rule, which specifically requires that\be followed by\n.
@Lucretiel do you have time to put together a PR with this grammar+prose change? I'm game.
Can we squeeze #213 into this? The specific proposal is the addition of escaped whitespace in string literals– that
\, followed by literal (non-escaped) whitespace, should consume and discard all that whitespace. This is a slight simplification of the Rust rule, which specifically requires that\be followed by\n.@Lucretiel do you have time to put together a PR with this grammar+prose change? I'm game.
Yes, tonight I can put that together :) should it be in the form of an amendment to SPEC.md?
I'm also a fan of #213, though it seems like there's some ambiguity in the discussion. Namely, does
- "x\ y\ z"Translate to
"xyz","x y z", or"x\ny\nz"?
I agree there's some ambiguity in the original. That example would translate to "xyz", because all literal whitespace after the \ is consumed and discarded. If you want to retain whitespace, it should either come before the \ or itself be escaped. I think my comment (https://github.com/kdl-org/kdl/issues/213#issuecomment-929869117) succinctly describes this.
Can we squeeze #213 into this? The specific proposal is the addition of escaped whitespace in string literals– that
\, followed by literal (non-escaped) whitespace, should consume and discard all that whitespace. This is a slight simplification of the Rust rule, which specifically requires that\be followed by\n.@Lucretiel do you have time to put together a PR with this grammar+prose change? I'm game.
Yes, tonight I can put that together :) should it be in the form of an amendment to SPEC.md?
yep!
I agree there's some ambiguity in the original. That example would translate to "xyz", because all literal whitespace after the \ is consumed and discarded. If you want to retain whitespace, it should either come before the \ or itself be escaped. I think my comment (https://github.com/kdl-org/kdl/issues/213#issuecomment-929869117) succinctly describes this.
Is this what Rust does? I would've expected that to at least preserve the first newline. Then again, this is consistent with KDL's existing escline rule where \<newline> is the same as <non-newline whitespace>
Is this what Rust does?
[src/main.rs:2] dbg!("\
here\
is\
an\
example\
") = "hereisanexample"
It's worth noting that bash behaves similarly as far as just dropping the newline, though it doesn't consume space afterward:
❯ echo foo\
… ❯ bar\
… ❯ baz
foobarbaz
With that I think xyz is the right output, and am +1 on including it in v2
Edit: Scratch that, I'm a space cadet:
❯ echo foo\
bar\
baz
foo bar baz
I'm more prone to emulating bash over rust, but I'm curious how others feel
Bash's behavior is concerned with syntactic whitespace (ie, allowing commands to spread over multiple lines with line continuations). It doesn't meaningfully behave in terms of consuming or not consuming specific whitespace so much as it extends a line to the next line while retaining the separation of tokens for a command. In your echo example, all that's happened is that the foo and bar and baz have correctly been passed as different arguments to echo; it's no different than:
> echo foo bar \
baz
foo bar baz
Kaydle has basically the same behavior with its own line continuation syntax, where you can use a \ to continue a single node into the next line. All these nodes are the same:
node 1 2 3
node 1 2 3
node 1\
2\
3
#213 is instead concerned with treatment of escaped whitespace in strings, where I think the plain consumption of unescaped whitespace makes the most sense
Is this what Rust does? I would've expected that to at least preserve the first newline. Then again, this is consistent with KDL's existing escline rule where <newline> is the same as
Rust does just consume all whitespace, regardless of type. The canonical way to add newlines to a whitespace-escaped string to to escape them:
assert_eq!(
"line 1\n\
line 2\n\
line 3\n",
"line 1
line 2
line 3
"
);
Though more commonly I use it to stretch out long sentences with simple spaces:
assert_eq!(
"This is a sentence with a \
lot of words in it.",
"This is a sentence with a lot of words in it."
);
That makes sense, and the distinction is certainly important. Thanks for the complete writeup.
Adding escaped whitespace note to the changelog: https://github.com/kdl-org/kdl/pull/291
Nudging the thread because I've added https://github.com/kdl-org/kdl/issues/250 to the bucket of things we should probably discuss for 2.0
Is https://github.com/kdl-org/kdl/discussions/177 worth including in discussions here?
yeah, probably. Although I'm inclined towards having foo and ("")foo be distinct values. I'm kinda iffy on the special case here.
I'm very split, on the one hand it is a special case and I'm very averse, but as @Patitotective noted this it would force implementations to distinguish between the two, which would lead to a more complex API in some languages (JS is top of mind). In addition, I just don't see a use case for blank type annotations given that impls are free to define their own.
I'm not really trying to go either direction here, just laying out thoughts.
Why would it be hard in JS? Can't JS just use null versus ""?
I haven't really been following this and I have no experience with type annotations in KDL but my initial reaction here is that specifying "" is potentially semantically distinct from not specifying a type annotation.
I'm curious what languages are actually expected to have a problem here? Languages without proper optional support tend to have some concept of null. Even in Go I'd expect that you could use a pointer-to-string in order to have nil.
That said, I've been using languages with proper optional support for long enough that I'm not sure how much of an ergonomic problem it would be to require folks to handle null type annotations in languages like Go or JS.
JS gives you null, but empty strings are falsey. It's for sure a small thing, it just means instead of writing:
if (val.annotation) {
...
}
You have to write:
if (val.annotation === null) {
...
}
I would count that as a more complex API.
For most statically typed languages (like Nim) you have to use Option types which distinguish between no value and empty value. This would make APIs more complex and seems pointless to me, ("")node should be the same as node.
I'm pro making them distinct values. Packages that work with the CST to provide an API for modifying KDL text while retaining comments and formatting would be more complex and likely inconsistent without the distinction.
If an empty or an absent annotation is considered the same, these packages would need to track that. Even if they then map empty and absent annotations onto the same public value, it would lead to confusing behaviour in the locations the different CST nodes:
node /* comment */ ()null
// | < end of the leading whitespace for the `null` value
// | < start of the `null` value
// __ < missing locations
Making the empty annotation () part of the leading whitespace would be wrong cf. the language specification.
Imo the best solution for these packages is to expose the difference between an empty and an absent annotation: consistent CST node locations and no () as part of whitespace.
One minor point to @bgotink's example, which I think is a good point, is the ()val does not agree with the spec by my reading, it has to be ("")val
I agree with @larsgw here in that if this is considered a problem to solve, the better solution would be to forbid zero-length identifiers rather than do some magic to make ("") equivalent to no type annotation. Given that type annotations are not given any meaning[^1] by the specification, it's fine imho for an implementation to treat a present-but-zero-length type annotation equivalently to no type annotation.
[^1]: > KDL does not specify any restrictions on what implementations might do with these annotations. They are free to ignore them, or use them to make decisions about how to interpret a value.
So my vote here is no change.
I agree on considering this a problem and forbidding zero-length identifiers.
This change would convert valid tests like blank_node_type.kdl, blank_arg_type.kdl, blanl_prop_type.kdl, empty_quoted_prop_key.kdl and empty_quoted_node_id.kdl to invalid, it would need to be specified in the spec.
If this is okay, I'll create a PR.
By the way, is there a reason why some tests use empty and others blank?
I'm pro-distinct too. I think it's fine if a particular consumer of KDL wants them to be identical, or even if a particular implementation of type hints treat them as identical, but I think that it makes sense that a KDL data model treats them as distinct (essentially, as Option<String>). I'd be opposed to requiring implementations to treat them as identical.
One example would be a particular KDL implementation that uses annotations exclusively as strong type hints. I'd want 123 to be dynamically typed, (f64)123 to be a float, and ("")123 to be an error.
I'd be opposed to banning 0-length identifiers; I think that adds more complexity / confusion than it's worth. Currently, an identifier is either "bare identifier" or "quoted string", and I'm not a fan of making it instead a special non-empty subset of "quoted string". I really like how the string is the "escape hatch" into unusual identifiers and don't really see the value in constraining it (especially since a vast majority of languages don't have an ergonomic way to express "string that's definitely not zero-length").