jackson-dataformats-text
jackson-dataformats-text copied to clipboard
in yaml, a ~ key should allow to assing a value to the parent key
In Yaml spec, if one property is the prefix of another, a value can be attributed to the parent one using ~ as a null key to represent any YAML property that is a prefix of another one.
The following test is failing:
public void testTildeIsDeserializedAsNullKey() throws Exception
{
final String YAML =
"quarkus:\n" +
" log:\n" +
" sentry:\n" +
" ~: true\n" +
" dsn: 'some string'";
JsonNode node = MAPPER.readValue(YAML, JsonNode.class);
assertTrue(node.get("quarkus").get("log").get("sentry").booleanValue());
}
instead, the following statement is true:
assertTrue(node.get("quarkus").get("log").get("sentry").get("~").booleanValue())
quarkus.log.sentry."~" is true instead of quarkus.log.sentry
Relevant conversation:
https://quarkusio.zulipchat.com/#narrow/stream/187038-dev/topic/yaml.20property.20conflict
Relevant spec:
https://quarkus.io/guides/config-yaml#configuration-property-conflicts
https://yaml.org/spec/1.2.2/
To clarify, this is causing issues with the Spotless formatter, it is basically transforming any ~ key in the "~" string
Quick question: how would you represent such document without ~ marker? Wouldn't there be problem with dsn key conflicting -- that is, there being both:
- quarkus.log.sentry = true
- quarkus.log.sentry.dsn = "some string"
which is impossible structurally?
Put another way; how would equivalent, non-tilde-using document look like?
I think you cannot, that's the whole point of the ~ key, it allows you to assign a value to a key that also has children
So could you not access quarkus.log.sentry.dsn in such case? Or is that simply dropped in such a case?
If there is no equivalent document, I don't think Jackson data model can represent this. And if not, this is something that cannot really be supported -- it might actually be best to just throw an exception if so.
But I guess I am also not clear on point of using such marker. I think I am missing something obvious.
EDIT: note, too, that link was to YAML 1.2 spec: Jackson 2.x uses SnakeYAML which supports YAML 1.1 only (https://bitbucket.org/snakeyaml/snakeyaml). I don't know if tilde handling is part of 1.1 already.
Ohhh. Maybe it's effectively "empty" key? So something like JSON would have:
{ "quarkus": {
"log": {
"sentry": {
"": true,
"dsn": "some string"
}
}
}
}
... although that would not be hidden by Jackson either; would need to be access with empty String as key.
EDIT: close, it is actually null key (similar to how "~" as value is same as null).
This is not something Jackson can represent since its data model does require all JsonToken.FIELD_NAME tokens to have String value. We could conceivably map it to empty String, becoming property with "" key. Not sure that'd be any better than leaving ~ as-is.
Based on all of above, I am pretty sure we cannot, in general make this work:
assertTrue(node.get("quarkus").get("log").get("sentry").booleanValue());
since "sentry" must be an Object value (with 2 logical properties; one with String key, other with null). I guess I don't know how YAML libraries generally would expose it either.
So the question would be whether to map ~ key to something else; and if so, to what?
The point of using this marker is to allow the use of yaml in configuration systems that have not initially been thought of with yaml in mind
Yeah the problem is that it does really map to JsonNode model (nor can I see an easy way to model it). In your case, either "null" key would need to replace value of sentry (so "sentry.dsn" would be lost), or we'd need a surrogate for "null" key.
For latter, empty String seems close enough -- but I am not sure TBH whether it'd be any more convenient to access than "~" key.
Put another way: what is being specifically asked is not doable. But I am open to ideas of alternative improvements.
@cowtowncoder are you effectively saying that jackson does'nt support true null key ? thats what ~ is, a special token to represent null/Null/NULL
Correct: Jackson's streaming/token model is based on JSON and there's no null key, only null values. And JsonNode, similarly only String keys. YAML and other formats need to be expressed in these models.
Theoretically I suppose it would be possible to expose null key but I don't think it'd work through all processing.
And even if it was done, would require access by null as key like:
node.get("quarkus").get("log").get("sentry").get(null).booleanValue()
Yeah; but that (using null) for lookup would be expected. But yeah I can see how that could be problematic in places.
I think I am still inclined to prefer empty String as surrogate (and possible make this YAMLParser.Feature whether to replace "~" key or not -- since there may well be existing code that relies on current handling).
I was first thinking of allowing specific "replacement key", that'd also allow null, but I think that there are too many places downstream that wouldn't stomach null that it's probably not worth it (more work to have non-boolean switches, but more importantly, trying to make databind pieces accept null for JsonToken.FIELD_NAME and so on).
@maxandersen the ~ character represents null in YAML 1.1, in YAML 1.2 they made YAML a superset of JSON (which does not support this char as null). The types have becode optional (they depend on the schema).
It is possible to configure SnakeYAML Engine to respect ~ as you wish, I am not sure Jackson should be able to do it.
I grok the concern - its just that with this it is not possible in yaml to assign values to the "parent". Thats why in Quarkus (and other) yaml configs the notion of ~ (not "~") - that works fine; becuse as you state in yaml 1.2 the schemas/usecase can choose to interpret it as null or not. We do that in Quarkus using snakeyaml - all is fine.
Problem comes when tools like spotless uses jackson to parse yaml thinking that jackson as fully yaml compliant but the formatting now "mutates" the content.
Hence it would IMO be nice that jackson at least had some way to retain the info.
@maxandersen I am afraid, there is some confusion here:
- As far as I can see Quarkus is giving
nulla very special meaning - I think Jackson 3.0 is fully YAML 1.2 compliant (because it is using SnakeYAML Engine)
- Quarkus does not use YAML 1.2 - I wonder why your examples refer to YAML 1.2
- Quarkus is using SnakeYAML (which explicitly supports YAML 1.1, and it does not define schemas)
- I can help to migrate Quarkus to SnakeYAML Engine (and YAML 1.2)
It looks like the issue was reported for Jackson 2.x (which is also using SnakeYAML and it is YAML 1.1 compliant)
@asomov i'm not (afaics) referring to YAML 1.2 - but even with that afaics Yaml 1.2 spec still specifies "null | Null | NULL | ~" as equivalent ... and https://yaml.org/type/null.html seem to confirm it to me.
hence i'm not sure what you mean that Quarkus would need to define a schema to get this behavior?
- Quarkus does not need to do anything if it is happy with what it has (YAML 1.1)
- Quarkus would need to define a schema only if it wants to use YAML 1.2 and apply SnakeYAML Engine
- YAML 1.2 defines
~asnullas a part of the Core schema (as far as I remember Jackson 3.0 uses the JSON schema at the moment) - Jackson serves a very general purpose, and probably the JSON schema better matches its use cases for the community.
...the issue is that spotless formatter uses jackson where ~ are not retained but converted into "~" ...not sure how that can be fixed without jackson being able to handle null ..?
@maxandersen if the expectration to support ~ is for Jackson 3 (and YAML 1.2) then I do not support that expectation.
I hope Qarkus may find a better solution for null representation. (or to support the internal functional requirement)
Using the Core YAML Schema would lead to even more questions for a general purpose parser.
Jackson may consider to implement a way to explicitly configure the Core Schema, but I think it will not help you - your users will not configure Jackson.
@asomov I feel we are misunderstanding each other here :)
Quarkus is working just fine - this yaml is valid yaml in any linter/parser/spec:
log:
sentry:
~: true
dsn: "some string"
The only reason I'm speaking about it here is that spotless uses jackson to parse the yaml and format/validate with it and here jackson mutates the output.
I don't follow what Quarkus can do differently here besides maybe also go "around" the jackson and snakeyaml-engine behavior and be ok with "" empty string to mean null...?
@maxandersen yes, it is valid, but it may result in different object. For YAML 1.1 (which is inside Quarcus), ~ is it parsed to null. For YAML 1.2 it may be parsed differently depending on the context. For absolute majority of users ~ is just a char (that is why YAML 1.2 became a superset of JSON).
Please note that Jackson does not mutate anything! It properly parses the input to create expected output.
@maxandersen Quarcus may drop the usage of specific feature of YAML 1.1 and the issue disappears.
I guess from my end, while I think I understand the issue, I am not sure how to address this at Jackson level.
The original request is something I do not think can be supported, unfortunately (seamless access via JsonNode), due to JsonNode being incapable of expressing such structures (like JSON format itself).
But something else perhaps could be supported -- if this is, f.ex, about just avoiding quoting ~ String (forcing "raw" output) to support round-tripping: not sure this is possible using SnakeYAML (Jackson 2.x) or snakeyaml-engine (3.x) but might be.
So I am open to figuring out something that can be done.
@asomov again - Quarkus is just fine here - its the reformatting happening when spotless use jackson of rewriting the content that is problematic IMO. Which is what @cowtowncoder is also hinting at.
I don't know enough about snakeyaml engine to know yet if that is possible.
It is unfortunate yaml 1.2 don't allow it as it makes yaml unnecessary verbose compared to a properties file :)
@maxandersen I am completely confused when you say "rewriting the content". It means that something is already written, and Jackson replaces it with something else. I see how Jackson can parse, but I do not get how it may "replace"
@asomov I think he means round-tripping (read-write), in which something reads YAML with Jackson into, say, JsonNode, possibly modifies some parts, than writes out as YAML. If so, "~" as key will be represented differently (as quoted String) as a result.
At least that is my reading.
If so, option I am thinking of (a new YAMLWriteFeature) is something like YAMLWriteFeature.WRITE_TILDE_KEY_UNQUOTED (to write '~' key unquoted).
Bigger change, if I had time to really go through the whole process, would be thinking of a way to support actual "null keys" for JsonNode. But that'd be much farther out at this point (if ever).
@cowtowncoder it cannot be the case because SnakeYAML does not wrire null as ~ (unless explicitly instructed, but it is not the case, I studied the source code of Quarkus and I did not see any explicit representer.)
YAMLWriteFeature.WRITE_TILDE_KEY_UNQUOTED cannot be used without switching the schema from JSON to Core.
@maxandersen can you please clarify "rewriting the content"?
@asomov @cowtowncoder is spot on - the issue is that spotless reads the yaml with jackson and writes it out again - in this process the null symbol is redefined to be a String so when it gets written back its with double quotes.
@maxandersen proposed solutions:
- implement a fix in spotless (to apply the Core schema using SnakeYAML Engine). But I wonder how should spotless know which schema to apply
- stop using
~as thenullrepresentation. Stick to the standard null representation
What is the standard as null?