OpenAPI-Specification icon indicating copy to clipboard operation
OpenAPI-Specification copied to clipboard

Formalize how to express null value in xml

Open ahmednfwela opened this issue 1 year ago • 10 comments

Since XML has no concept of null, how can we handle validating a null value (both as an attribute and as an element) ?

consider the following 3.0 schema :

person:
  type: object
  required:
    - name
    - attrName
  properties:
    name:
      type: string
      nullable: true
    attrName:
      type: string
      nullable: true
      xml:
        attribute: true

notice how required here prevents us from removing the attribute/element

I have thought about this, and here are some of the approaches I came up with:

For elements

Approach 1: self closing tags

<person>
  <name />
</person>

Pros: Makes sense to whoever reads it Cons: Nothing i can think of, but maybe some xml parsers can consider a self-closing tag equivalent to empty string and don't distinguish between them, which means they don't survive round tripping: e.g. <name /> gets represented on the way back to: <name></name>

Approach 2: empty string

<person>
  <name></name>
</person>

Cons: if the property is of type string, it's not possible to distinguish between a non-null empty string and a null. Workaround: force strings to be wrapped around double quotes, e.g.

  • this is null:
<name></name>
  • this is empty string:
<name>""</name>
  • this is valid string:
<name>"hello"</name>
  • this is non valid string
<name>hello</name>

Ofc this workaround is very problematic and not good since most parsers consider "" A valid 2 character string.

Approach 3: special marker attribute

<name xsi:nil="true"></name>
<name xsi:nil="true"/>

Pros: Can represent nulls consistently without having to check the contents of the element, this is also how xml schema does it. Cons: Size overhead of having to use xsi:nil="true" everywhere null is used.

For attributes

Approach 1: empty string

<person attrName="" />

Approach 2: disallow nullable attributes altogether

make it that xml.attribute: true and nullable: true are mutually exclusive

ahmednfwela avatar Jul 16 '24 11:07 ahmednfwela

@ahmednfwela Thanks for reporting this, and for the detailed research and references.

Preliminary analysis for elements

XML 1.0, section 3.1 "Start-Tags, End-Tags, and Empty-Element Tags" states that the element forms <name></name> and <name/> are equivalent and represent an element with no content, aka an empty element, so approaches 1 and 2 are equivalent.

The meaning of "empty" seems to depend on context/implementation; for string-valued elements "empty" means the empty string.

So approach 3 (xsi:nil="true") seems to be the way forward.

Preliminary analysis for attributes

XML 1.0, section 3.3.3 "Attribute-Value Normalization" describes an algorithm that MUST be applied before the value of an attribute is passed to the application or checked for validity. This algorithm begins with a normalized value consisting of the empty string, then appends to it. Thus attribute values are always strings, potentially the empty string, and never null.

So approach 2 (disallow nullable attributes) seems to be the way forward.

ralfhandl avatar Jul 17 '24 11:07 ralfhandl

@ralfhandl For attributes, is there an Option 3: omit the attribute?

handrews avatar May 05 '25 20:05 handrews

I am asking because due to compatibility reasons, we cannot forbid type: "null" on XML attributes in 3.x, so we need an alternative to recommend as a SHOULD. Mapping null to the empty string does not seem ideal.

handrews avatar May 05 '25 20:05 handrews

@handrews How would this be distinguishable from required: false ?

ahmednfwela avatar May 05 '25 20:05 ahmednfwela

@ahmednfwela in practice, it's probably not. But we can't do the preferred option of forbidding null for attributes in 3.2 because that would break compatibility with 3.1. (we don't want another incompatibility in a minor release after 3.0->3.1).

handrews avatar May 05 '25 20:05 handrews

so we can say that for attributes setting nullable: true ignores the value of required.

ahmednfwela avatar May 05 '25 21:05 ahmednfwela

@ahmednfwela we can't do anything that contradicts specified behavior in OAS 3.1.

handrews avatar May 05 '25 23:05 handrews

@ralfhandl For attributes, is there an Option 3: omit the attribute?

@handrews but this is already contradicting OAS 3.1, as a schema with required: true + nullable: true would be invalid in 3.1 if the attribute was omitted

ahmednfwela avatar May 06 '25 15:05 ahmednfwela

Hmm... I do see your point, @ahmednfwela . I'm kind of at the point of throwing my hands up on this and saying that if you try to use type: "null" on an attribute the results are implementation-defined. Which we need to do for both of these anyway for compatibility, but at least we can give a SHOULD for elements this way.

handrews avatar May 06 '25 19:05 handrews

@ahmednfwela I've continued to think about this, and I think the concern about constructs like:

properties:
  someElement:
    required:
    - someAttribute
    properties:
      someAttribute:
        type: [number, "null"]
        xml:
          attribute: true

do not prevent handling null by omitting the attribute.

There are two representations of the data here: The in-memory data which needs to be in a structure that can be modeled by JSON Schema, and the XML serialization, which does not. The XML Object tells us how to map between the two.

The JSON Schema constraints apply to the in-memory data. As long as the mapping from null to a missing attribute and back to null is well-defined, then the required constraint is satisfied. This basically turns the question around into the following form:

  • Serialization:
    • First validate the in-memory representation; then, if the in-memory value is null, omit the attribute
  • Parsing:
    • If the schema supports type: "null" and the attribute is missing, set the in-memory value to null
    • If the schema does not support type: "null" and the attribute is missing, then there is no available mapping, and if the attribute is also required then validation fails

There really is no reason to try to have a special null representation for attributes. It is purely an in-memory-data-structure construct, and the only logical option is to omit it.

There is the awkwardness that the value to which the missing attribute is parsed is dependent on the schema, but a 1:1 mapping is not possible. We would produce a similar overloading by mapping to the empty string, but in that case the round-trip would be lossy (there's no way to tell whether an attribute with an empty string and a schema with "type": ["string", "null"] should be parsed as the empty string or as null, and the more logical option is the empty string, which means null would round-trip to the empty string. With the above approach, null round-trips correctly.

As with all of the options explored, this is not ideal. But it strikes me as more consistent than any of the others. It round-trips correctly, and while some schema constructs don't make much sense, that is true of JSON Schema in general and is therefore acceptable. You can write a lot of schemas that are redundant or impossible.

handrews avatar May 15 '25 21:05 handrews

This ended up getting fixed for 3.2 in PR #4612, so closing.

handrews avatar Jul 31 '25 19:07 handrews