json-schema-spec icon indicating copy to clipboard operation
json-schema-spec copied to clipboard

"missing" or "defaultProperties" annotation keyword

Open awwright opened this issue 4 years ago • 17 comments

The "default" keyword is one of the most misused & abused annotation keywords due to consequences of how JSON Schema works.

It is frequently assumed that if a property in an instance document is missing, the "default" keyword lets you create that property and fill in that value as if it has the same behavior.

But this is not actually the case. The biggest reason is the "default" keyword does not produce an annotation unless the property exists in the instance in the first place... defeating the point.

The "default" keyword is mostly useful for user interface and IDE work, where a user indicates they want to create a value, and so the "default" keyword can provide a sensible initial default in these cases. For example:

  • A user creates a new record in a document database (like a MongoDB collection). The interface creates an instance of the schema, reading the "default" keyword to provide a value; instead of creating a blank document (which is invalid JSON).

  • A user in an IDE is typing { "name": and then tab-completes in the default value, an empty string (as opposed to e.g. a number, or another object).

However, all the time, users seem to think you can substitute in the "default" value for a missing one:

  • https://github.com/json-schema-org/json-schema-spec/issues/858
  • https://stackoverflow.com/questions/60433669/how-to-add-default-values-from-json-schema-to-json-document-in-java/60443802 (see my answer for more details on why "default" probably doesn't do what you think it does)

This also seemed sensible to me until I realized that "default" doesn't produce an annotation if the instance is absent.

I'm proposing a keyword that does mean exactly this: it means "if the instance is an object, and it is missing any of the given properties, the behavior will be the same as if it were defined with the specified value".

Example schema:

{
  "missing": {
    "port": 80
  },
  "properties": {
    "port": { "type":"integer" }
  }
}

This would allow implementations to infer that this:

{ }

will behave the same as this:

{ "port": 80 }

Some implementations or tools might even offer a way to return a copy of the instance with these values filled in. This would be useful for applications that want to both validate user input, and fill in defaults; instead of having to perform these as separate operations.

awwright avatar Mar 01 '20 22:03 awwright

That is a good point about default never actually annotating an instance! I'll have to think more on the rest.

handrews avatar Mar 02 '20 05:03 handrews

Alternative names could be e.g. "undefinedValues", "defaultValues" or simply "defaults" (plural).

We may also want to consider an array form, for tuples, or function arguments.

One more thing to consider is that "default" does not have to be valid according to the schema. "missing" would have to be, since by definition the behavior is the same.

awwright avatar Mar 02 '20 07:03 awwright

The thing that needs to be explicitly stated here is that (given the description above) the default isn't applied when the property isn't present because (in some implementations) that property's subschema is skipped in that case.

I think this is an incorrect behavior.

Core section 9.3.2.1 doesn't state that the property subschema may be skipped if the property is absent, however it does have

The annotation result of this keyword is the set of instance property names matched by this keyword.

which may be interpreted as "don't bother validating properties that aren't there."

I think the proper way to implement this is to process a property's subschema, even if the property is absent. Doing this allows default to generate an annotation, which the consuming application can use to apply the default to the model.

gregsdennis avatar Mar 02 '20 07:03 gregsdennis

I think this is an incorrect behavior.

On whose part? This isn't a defined behavior, so much as a logical consequence of how we've defined "properties".

I think the proper way to implement this is to process a property's subschema, even if the property is absent.

What would this even mean? It's not meaningful to run undefined against a JSON Schema, as undefined is not valid JSON. And there's no other precedent for such a feature.

And even if we did, that wouldn't fix the problem this keyword is addressing. "default" is allowed to be invalid as an instance, or show a different behavior than omitting the property. e.g. #858

Maybe bring your point up as a new issue?

awwright avatar Mar 02 '20 07:03 awwright

I think @awwright is correct about the implications of how we've defined applicators and annotations to work. Of course, that's all fairly new and we can tweak it if we want to.

It's also important to think about generative use cases vs validation use cases. Generative use cases work without an instance, and are really where the default keyword is useful.

  • A code generator can initialize a field with it.
  • A doc generator can document the behavior
  • A UI generator can pre-populate an input

In truth, default shouldn't even be examined during validation even if the schema is applied.

While I get the point of the missing keyword and see the use case, I think we would benefit from emphasizing the generative approach as distinct from the validation approach, and see if that helps clarify things. With OpenAPI adopting the latest draft, we will have a lot more eyes on generation and can work with this actively.

If that does not solve the problem, I would be open to a keyword such as this in the future. I am also happy that describing the behaviors of annotations and applicators have given us a sane way to talk about default.

In the meantime, if someone wanted to do this as an extension keyword, that would also be a good way to explore its utility.

@awwright would you be OK with moving this to the vocabularies repository for now? If we try the generative stuff with the OpenAPI folks and that doesn't work, but a trial run extension keyword finds adoption, we can "promote" the issue back to this repo. Does that seem reasonable to you as well @gregsdennis?

handrews avatar Mar 03 '20 22:03 handrews

@handrews I think this should be kept next to "default"—if anything, this keyword would be used more often than "default" would.

Also, another naming idea: "defaultProperties" and "defaultItems" — for filling in missing properties and items, respectively.

awwright avatar Mar 04 '20 01:03 awwright

Yeah, I noticed the issue around how default as an annotation will not be collected if the property doesn't exist. I feel this IS the correct behaviour.

I would suggest actually what we have, and failed to recoginise, is a different class of annotation. It's a "LOCATION ANNOTATION KEYWORD", not just an annotation keyword.

It's an annotation relating to the location, not the instance data.

My knee jerk reaction is to suggest removing it till we can properly define it, however I am ill informed as to the implications for OpenAPI, if any. (Of course, they could define their own logic for how to use default then).

Further, it sounds like a class of keywords we need to define and allow for, especially given, as you've pointed out, there's a class of activities (generation and auto complete in IDEs) which we haven't considered because "those are for vocabularies, man". That's fine, but recognising there's a separate class of keyword which can and likely will be used outside of the "apply schema to instance to get annotations" framework for collecting annotations.

I'm not even 75% "LOCATION ANNOTATION KEYWORD" is the correct phrasing, because it's no longer an annotation when used in the context outside of applying the schema. "structural information keyword" maybe or something similar?

Open to further discussion.

I think we need to hold moving this to vocabularies repo till we decide if we should define (in a very limited way) a new class of keywords.

Relequestual avatar Mar 06 '20 12:03 Relequestual

I think the proper way to implement this is to process a property's subschema, even if the property is absent. Doing this allows default to generate an annotation, which the consuming application can use to apply the default to the model.

This needs more context.

So this comes from the idea that the JSON instances is ultimately going to be deserializing into a model, and for me, that means .Net. When deserializing in .Net, if a property is missing, knowing what the default should be is important because the property has to be populated with some value. When not specified, the default value for the property's type is used. But if the schema specifies a non-.Net-type-default, that value should be used.

If default doesn't generate an annotation when the property is missing, the application doesn't know about the specified default, so it can't apply it.

gregsdennis avatar Mar 07 '20 19:03 gregsdennis

@gregsdennis I think a good question to ask is whether that deserialization is part of validation, or part of a form of code generation. Meaning, would it make sense to scan for defaults up front and then look them up as you realize you have a missing value?

@Relequestual I'll come back to your comment when I have a bit more time.

For now, I agree that there's something interesting going on here that warrants continued discussion in this repo.

handrews avatar Mar 07 '20 21:03 handrews

It's the other way around. You'd want to validate that the JSON matches your models prior to deserialization. So validation can become part of deserialization, though not strictly required. This is where having those annotations (e.g. from default) helps, especially when the value is missing from the JSON.

gregsdennis avatar Mar 08 '20 04:03 gregsdennis

As part of this deserialization, once you have a value supplied by default et al., then you can look at all the relevant schemas (including patternProperties, etc) to determine how it should be unpacked. You might even read "format" for this purpose, and put "date" into a Date object, and so on.

awwright avatar Mar 08 '20 04:03 awwright

@gregsdennis I agree (I think) with what you say about validation before deserialization. But I guess I'm really thinking of three phases, which might not make any sense for C#/.NET, which I've never known well and haven't looked at at all for over a decade.

I'm thinking in terms of:

  • Code generation: Setting up the code that lays out the class or data structure or whatever, including initialization statements for missing fields. This may be done as a fully separate step producing specific classes, and then deserializing the JSON instance into a class instance / object. Or it might be some sort of just-in-time activity where there is not a class sitting around in code. This is where my lack of understanding of .NET is probably a problem
  • Validation: you don't want to bother instantiating the class (or whatever) if the data is invalid
  • Instantiation: valid data is passed to the class constructor (or however that is expected to work)

Clearly, this is not the way you're thinking about it and since you understand the language you're working in and I don't, I'm assuming I have something wrong here. But I'd like to understand better what that is.

handrews avatar Apr 26 '20 04:04 handrews

I think that "missing" (or "defaults" or whatever) is an excellent suggestion (with some comments below), especially because until I considered @awwright 's remarks I did not understand "default" very well myself.

In the application I know best, the schema is loaded with "default" values for properties which are intended to do three things:

(1) Document the application semantics behind the interface-- telling the user "you may choose a value for this parameter but if you don't it will take on the specified 'default' value";

(2) Guide people or processes to generate documents-- telling them how to specify functionally-required parameters completely (as by showing them in a UI and/or adding them to each document) but distinguishing which of those parameters have default values (also useful as initial values displayed in UI when creating a new doc) versus those which must be filled by the user (or which are truly optional, depending on "required");

and, the part that causes the most trouble,

(3) Tell the document parser/validator to supply (insert) missing properties with specified values-- in order to achieve the semantics (1) when the document generator (2) leaves something out, which is very common, as when a human whomps up a minimum-acceptable-doc ("required" properties only) trusting the parser/validator to supply all the default-value properties the application needs.

AJV's useDefaults option gets (3) done well for the application I mentioned (thank you @epoberezkin ), but now I understand better how special that is, since missing properties do not, as @awwright pointed out, actually match anything in the schema for normal validation purposes.

Note that "required" and "default" can't be used together when "default" is used to insert properties instead of just giving advice to document-generators, because if a submitted doc is "required" to have a certain property the parser/validator will never have any reason to insert it.

I have not yet thought through multiple-applicable "missing" conflicts due to "anyOf" etc. but I immediately perceive that having both "default" and "missing" could be a bit awkward. Obviously they can be defined so there is no semantic conflict, but how should they be explained to schema users?

Perhaps like this: if present, "default" indicates the value recommended when a document-generator has nothing better in mind and/or an initial value for a UI to display, while "missing" indicates exactly which properties (and values) may be inserted into a document if not already present when a recipient tries to interpret said doc. Once "missing" becomes a schema property, schema-writers who intend to use it to insert missing properties at (near) validation time will be free to use the "required" keyword along with the "default" keyword to tell document-generators unambiguously that a property must be supplied and ought to be given the "default" value if no other is desired.

If "default" is not present then a document generator could rely on "missing" for a recommended value. That would be backward-compatible as well as giving schema writers some flexibility to guide UI's (there are cases in which recommended value doesn't match default value). Tools can be provided to warn schema writers of unintended mismatches between "default" and "missing" values.

(If we could start from scratch we might revise the names, like default to suggested and missing to defaults.)

markchart avatar Oct 24 '20 00:10 markchart

In my opinion, the entire "default" concept should be removed to promote correct solution architecture. Schema validation is a read-only pass/fail concept, of course it should never be relied on to alter the data (as it seems you all generally agree), but even "missing value" suggestions should not be defined here either. The concept of default values is contextual... when you're storing values, your storage system may consider certain "defaults", when you're exposing data to API users, that may consider entirely different "defaults", no one should be led to believe that the core data schema should carry that information in a properly designed system.

ciabaros avatar Jan 11 '21 20:01 ciabaros

The concept of default values is contextual

Yes, this is a good way to think about it.

when you're storing values, your storage system may consider certain "defaults", when you're exposing data to API users, that may consider entirely different "defaults", no one should be led to believe that the core data schema should carry that information in a properly designed system.

This is a fair point, but that doesn't mean there cannot be a concept of a default value in JSON Schema. It just means that annotation and validations are different functions and sometimes they don't overlap.

awwright avatar Jan 15 '21 02:01 awwright

Another idea for a keyword name: "fill" or "fillProperties" (as in, "fill in these missing values")

awwright avatar Jan 24 '22 20:01 awwright

Also, another naming idea: "defaultProperties" and "defaultItems" — for filling in missing properties and items, respectively.

I would go with propertyDefaults and itemDefaults. This (to me) highlights the idea that it's the defaults for the properties rather than a set of properties which should be included by default.

gregsdennis avatar Jul 26 '22 21:07 gregsdennis