OpenAPI-Specification
OpenAPI-Specification copied to clipboard
Formalize how to express null value in xml
Since XML has no concept of null, how can we handle validating a null value (both as an attribute and as an element) ?
consider the following 3.0 schema :
person:
type: object
required:
- name
- attrName
properties:
name:
type: string
nullable: true
attrName:
type: string
nullable: true
xml:
attribute: true
notice how required here prevents us from removing the attribute/element
I have thought about this, and here are some of the approaches I came up with:
For elements
Approach 1: self closing tags
<person>
<name />
</person>
Pros: Makes sense to whoever reads it
Cons: Nothing i can think of, but maybe some xml parsers can consider a self-closing tag equivalent to empty string and don't distinguish between them, which means they don't survive round tripping:
e.g.
<name /> gets represented on the way back to: <name></name>
Approach 2: empty string
<person>
<name></name>
</person>
Cons: if the property is of type string, it's not possible to distinguish between a non-null empty string and a null. Workaround: force strings to be wrapped around double quotes, e.g.
- this is null:
<name></name>
- this is empty string:
<name>""</name>
- this is valid string:
<name>"hello"</name>
- this is non valid string
<name>hello</name>
Ofc this workaround is very problematic and not good since most parsers consider "" A valid 2 character string.
Approach 3: special marker attribute
<name xsi:nil="true"></name>
<name xsi:nil="true"/>
Pros: Can represent nulls consistently without having to check the contents of the element, this is also how xml schema does it.
Cons: Size overhead of having to use xsi:nil="true" everywhere null is used.
For attributes
Approach 1: empty string
<person attrName="" />
Approach 2: disallow nullable attributes altogether
make it that xml.attribute: true and nullable: true are mutually exclusive
@ahmednfwela Thanks for reporting this, and for the detailed research and references.
Preliminary analysis for elements
XML 1.0, section 3.1 "Start-Tags, End-Tags, and Empty-Element Tags" states that the element forms <name></name> and <name/> are equivalent and represent an element with no content, aka an empty element, so approaches 1 and 2 are equivalent.
The meaning of "empty" seems to depend on context/implementation; for string-valued elements "empty" means the empty string.
So approach 3 (xsi:nil="true") seems to be the way forward.
Preliminary analysis for attributes
XML 1.0, section 3.3.3 "Attribute-Value Normalization" describes an algorithm that MUST be applied before the value of an attribute is passed to the application or checked for validity. This algorithm begins with a normalized value consisting of the empty string, then appends to it. Thus attribute values are always strings, potentially the empty string, and never null.
So approach 2 (disallow nullable attributes) seems to be the way forward.
@ralfhandl For attributes, is there an Option 3: omit the attribute?
I am asking because due to compatibility reasons, we cannot forbid type: "null" on XML attributes in 3.x, so we need an alternative to recommend as a SHOULD. Mapping null to the empty string does not seem ideal.
@handrews How would this be distinguishable from required: false ?
@ahmednfwela in practice, it's probably not. But we can't do the preferred option of forbidding null for attributes in 3.2 because that would break compatibility with 3.1. (we don't want another incompatibility in a minor release after 3.0->3.1).
so we can say that for attributes setting nullable: true ignores the value of required.
@ahmednfwela we can't do anything that contradicts specified behavior in OAS 3.1.
@ralfhandl For attributes, is there an Option 3: omit the attribute?
@handrews but this is already contradicting OAS 3.1, as a schema with required: true + nullable: true would be invalid in 3.1 if the attribute was omitted
Hmm... I do see your point, @ahmednfwela . I'm kind of at the point of throwing my hands up on this and saying that if you try to use type: "null" on an attribute the results are implementation-defined. Which we need to do for both of these anyway for compatibility, but at least we can give a SHOULD for elements this way.
@ahmednfwela I've continued to think about this, and I think the concern about constructs like:
properties:
someElement:
required:
- someAttribute
properties:
someAttribute:
type: [number, "null"]
xml:
attribute: true
do not prevent handling null by omitting the attribute.
There are two representations of the data here: The in-memory data which needs to be in a structure that can be modeled by JSON Schema, and the XML serialization, which does not. The XML Object tells us how to map between the two.
The JSON Schema constraints apply to the in-memory data. As long as the mapping from null to a missing attribute and back to null is well-defined, then the required constraint is satisfied. This basically turns the question around into the following form:
- Serialization:
- First validate the in-memory representation; then, if the in-memory value is
null, omit the attribute
- First validate the in-memory representation; then, if the in-memory value is
- Parsing:
- If the schema supports
type: "null"and the attribute is missing, set the in-memory value tonull - If the schema does not support
type: "null"and the attribute is missing, then there is no available mapping, and if the attribute is alsorequiredthen validation fails
- If the schema supports
There really is no reason to try to have a special null representation for attributes. It is purely an in-memory-data-structure construct, and the only logical option is to omit it.
There is the awkwardness that the value to which the missing attribute is parsed is dependent on the schema, but a 1:1 mapping is not possible. We would produce a similar overloading by mapping to the empty string, but in that case the round-trip would be lossy (there's no way to tell whether an attribute with an empty string and a schema with "type": ["string", "null"] should be parsed as the empty string or as null, and the more logical option is the empty string, which means null would round-trip to the empty string. With the above approach, null round-trips correctly.
As with all of the options explored, this is not ideal. But it strikes me as more consistent than any of the others. It round-trips correctly, and while some schema constructs don't make much sense, that is true of JSON Schema in general and is therefore acceptable. You can write a lot of schemas that are redundant or impossible.
This ended up getting fixed for 3.2 in PR #4612, so closing.