schemars
schemars copied to clipboard
FEEDBACK WANTED - API proposal - supporting multiple JSON schema versions, breaking changes to `Schema` definition
In Project Status and Road to 1.0, one of the requirements for schemars 1.0 was "handling of different JSON schema versions/dialects (and how to handle future JSON schema versions)". Handling future versions of JSON schema in a non-breaking way would have been difficult-to-impossible a few years ago, but should be achievable now that the latest version of JSON Schema is expected to be stable - see https://json-schema.org/blog/posts/future-of-json-schema and https://json-schema.org/blog/posts/the-last-breaking-change.
This API proposal shows how the way a JSON Schema is modelled within schemars may be changed to support any arbitrary version of JSON schema without future breaking changes (although this change itself would be breaking).
The World Today
Currently, schemars has a few types that define a schema which are mostly based on the draft-07 version of JSON Schema - most prominently (simplified for brevity):
pub enum Schema {
Bool(bool),
Object(SchemaObject),
}
pub struct RootSchema {
#[serde(rename = "$schema"]
pub meta_schema: Option<String>,
#[serde(flatten)]
pub schema: SchemaObject,
#[serde(alias = "$defs"]
pub definitions: Map<String, Schema>,
}
pub struct SchemaObject {
#[serde(flatten)]
pub metadata: Option<Box<Metadata>>,
#[serde(rename = "type")]
pub instance_type: Option<SingleOrVec<InstanceType>>,
pub format: Option<String>,
#[serde(rename = "enum")]
pub enum_values: Option<Vec<Value>>,
#[serde(rename = "const")]
pub const_value: Option<Value>,
#[serde(flatten)]
pub subschemas: Option<Box<SubschemaValidation>>,
#[serde(flatten)]
pub number: Option<Box<NumberValidation>>,
#[serde(flatten)]
pub string: Option<Box<StringValidation>>,
#[serde(flatten)]
pub array: Option<Box<ArrayValidation>>,
#[serde(flatten)]
pub object: Option<Box<ObjectValidation>>,
#[serde(rename = "$ref")]
pub reference: Option<String>,
#[serde(flatten)]
pub extensions: Map<String, Value>,
}
Fun fact: the multiple #[serde(flatten)]
'd fields were an optimisation to try to save a little memory for the common case of simple schemas with only one or two properties. This was probably a premature optimisation that needlessly complicated everything!
Strongly-typing the schema likes this makes it easier for both schemars and its consumers to keep schemas in a valid state. However, it also causes some problems, particularly around supporting multiple versions of JSON Schema, e.g.
- reusable schemas are always serialised under
definitions
instead of$defs
which has been preferred since 2019-09 - the
items
keyword is defined as aSingleOrVec<Schema>
(i.e. it can be a single schema or an array of schemas), but since 2020-12 it can only be a single schema -
$schema
is only defined in the top-most schema (RootSchema
), but can leagally appear in subschemas e.g. when bundling subschemas with different$schema
s -
InstanceType
is defined as an exhaustive enum, even though vocabularies may define their own types (likeinteger
, which schemars supports despite it not being defined by JSON Schema) - schemars does not define fields for some lesser-used JSON Schema keywords
Some of these problems could be solved by one-time breaking change to schemars that completely drops support for older versions of JSON schema (and swagger/openapi). Alternatively, we could change these structs to match JSON Schema 2020-12, while semi-supporting old versions by using the extensions
map to set a value if it does not conform to the 2020-12 types. But even then, supporting future versions of JSON Schema may introduce new problems: non-breaking changes like adding a new keyword may be difficult to support in schemars in a non-breaking fashion, unless pretty much every struct is annotated with #[non_exhaustive]
, which would make constructing schemas much more difficult - and that still wouldn't be sufficient to support other potential non-breaking changes (e.g. if an existing keyword was updated to allow additional data types).
Proposed Change
As anyone who knows my feelings on JS vs TS can attest, I am a vehement proponent of strict typing - but I think that going forward, schemars should no longer define a Schema
type with a list of fields corresponding to JSON Schema keywords. More concretely, I propose that schemars define Schema
as simply:
#[repr(transparent)]
pub struct Schema(serde_json::Value);
where the inner Value
is either a Value::Object
or Value::Bool
. Then, properties of the schema (assuming it's an object, not a bool) can be any arbitary JSON-compatible value. The inner Value
is not pub
, so is not actually part of the public API.
Note that this would be conceptually similar to:
pub enum Schema {
Bool(bool),
Object(Map<String, Value>),
}
An advantage of the enum instead of the newtype struct would be that it makes invalid states (e.g. trying to use a number as a schema) unrepresentable in the type system. The main reason I'm proposing a newtype struct instead is to allow converting a &Value
/&mut Value
to a &Schema
/&mut Schema
(probably via ref-cast), which would be useful in a number of scenarios including implementing visitors - this is why the struct has #[repr(transparent)]
. It should be impossible to construct a Schema
from a Value
that is neither a bool nor an object (hence the inner value field not being pub
), and any functions exposed by schemars that construct a Schema
must uphold this invariant - so e.g. Schema
would implement TryFrom<Value>
rather than From<Value>
.
While this would be a fairly major breaking change for any consumers of schemars who construct and/or manipulate schemas, the vast majority of consumers who just #[derive(JsonSchema)]
, generate a schema for their types and serialise it to JSON would not be affected by this proposed change. And conveniently for me, the vast majority of schemars's tests can be left largely as they are!
Notable traits and functions on a Schema
would include:
impl TryFrom<Value> for Schema { ... }
impl TryFrom<&Value> for &Schema { ... }
impl TryFrom<&mut Value> for &mut Schema { ... }
impl From<bool> for Schema { ... }
impl From<Map<String, Value>> for Schema { ... }
impl From<Schema> for Value { ... }
impl From<Schema> for Map<String, Value> { ... }
impl Schema {
pub fn as_bool(&self) -> Option<bool> { ... }
pub fn as_object(&self) -> Option<&Map<String, Value>> { ... }
pub fn as_object_mut(&mut self) -> Option<&mut Map<String, Value>> { ... }
// alternatively, as_* could return Err(_) for non-matching schemas:
pub fn as_bool(&self) -> Result<bool, &Map<String, Value>> { ... }
pub fn as_object(&self) -> Result<&Map<String, Value>, bool> { ... }
// ...but then what about as_object_mut?
// converts bool schemas to objects, so infallible
pub fn ensure_object(&mut self) -> &mut Map<String, Value> { ... }
pub fn get(&self, key: impl Borrow<str>) -> Option<&Value> { ... }
pub fn get_mut(&mut self, key: impl Borrow<str>) -> Option<&mut Value> { ... }
// converts bool schemas to objects
pub fn set(&mut self, key: String, value: Value) -> Option<Value> { ... }
}
For convenience, schemars could also export a macro similar to serde_json's json!()
that constructs a Schema
while ensuring it's passed an object or bool:
let schema: Schema = json_schema!({}); // OK
let schema: Schema = json_schema!({ "type": "string" }); // OK
let schema: Schema = json_schema!(true); // OK
let schema: Schema = json_schema!("uh oh!"); // compile-time error
Note that such a macro would probably not validate that all properties are well-defined JSON Schema keywords, e.g. json_schema!({ "foobar": 123 })
would be allowed. Bear in mind an equivalent schema can be already constructed today due to the existing extensions
field.
Further possibilities
In lieu of fields, Schema
could also have getter/setter/mutator functions to aid processing and manipulating schemas. Then if new keywords are added to JSON Schema, corresponding functions could be added to schemars as a non-breaking change. Defining these would be fairly straightforward for "simple" properties like strings or numbers:
impl Schema {
pub fn format(&self) => Option<&str> { ... }
pub fn set_format(&mut self, format: String) { ... }
}
But for more complex properties that may require in-place mutation, this may require functions like xyz_mut(&mut self) -> Option<&mut XYZ>
which would require schemars to define new types to wrap the underlying &mut Value
. It may also be useful to define an entry-like API instead of (or as well as) the xyz_mut
functions. Either way, such methods are not part of this proposal, but could be added later as a non-breaking change. Until/unless that happens, the main way to manipulate schema properties would be with either the get
/get_mut
/set
methods proposed above, or getting a mut reference to the schema's underlying Map
.
How different JSON Schema versions would be supported
When implementing JsonSchema
on a type, it's currently not clear which version of JSON schema should be produced - schemars currently assumes that the generated schema is compatible with draft 2019-09 (which the current Schema
/SchemaObject
definition is mostly-compatible with), but this isn't documented anywhere. So I propose these high-level guidelines for determing which version of JSON schema to produce:
- the implementation of the
json_schema()
function may check the requestedmeta_schema
(available on the settings of theSchemaGenerator
passed in as an argument) to determine which type/version of JSON schema has been requested, and generate the schema according to that version - if the implementation doesn't recognise the meta schema URI, or (probably more likely) the implementor doesn't want to deal with the complexity of supporting multiple versions of JSON schema, it should generate a schema that's valid under draft 2020-12. Then, if the originally requested version is older than 2020-12 (and supported by schemars), the
SchemaGenerator
will transform the schema to the originally requested version using something like theVisitor
s that exist today. If the requested version is newer than 2020-12 (i.e. a future version) then there should be no work required, assuming that all future versions are indeed backward-compatible
Open questions:
- is
meta_schema
sufficient, or shouldSchemaSettings
also/instead have some sort of "compatibility mode", e.g. to support custom meta schemas that are based on a specific version of JSON schema?
I like this idea of (roughly) JsonSchema::json_schema(..) -> serde_json::Value
a lot. At first, I thought it was weird and lame, but the more I thought it over, the more it appeals to me. Consider my own use case: we're using schemars
for a bunch of OpenAPI related stuff. We have to specify that we're interested in the OpenAPI format and fish around the extensions
for nullable
. It would be much crisper to just get back a json Value
and use that as is.
Your insight that this divorces the output type from the weird differences of various JSON schema revisions is extremely compelling. This even opens the door for a generic "schema" representation that can have a JsonSchema
impl (e.g. https://crates.io/crates/schema and there could be a impl<T:schema::Schema> schemars::JsonSchema for T {}
).
One could imagine schema descriptions even unrelated to JSON schema... though perhaps that's a bridge too far.
In addition, I think it makes a ton of sense to have the structural use of JSON schema types live in a different crate -- the design goals are distinct and not necessarily well-aligned.
One question: do you definitely want to use serde_json::Value
? On one hand, consumers of schemars
almost certainly also depend on serde_json
. On the other hand, I've often wished there were a distinct Value
object so I didn't need to pull in all of serde_json
.
This does seem to complicate hand-written JsonSchema
implementations in that one might need to pay closer attention to the generation settings. It might be useful to provide some mechanism that's effectively "here's my output in 2019-09" format, could you please transform this according to what the caller is asking for?"
Cool stuff; would be happy to contribute; I've already been planning a 2020-12 JSON Schema crate for an OpenAPI v3.1 compatible version of https://crates.io/crates/openapiv3
Hello,
The changes all make sense, as my main (only) use case for the library is to produce programmatically the #/components/schemas
part of an OpenAPI schema from a collection of Rust structs, I’m very happy to see it’s going to provide some extra flexibility in usage.
For the ability to produce extra schema outputs, I feel like that having the option to specify in the json_schema
call which variant to target is going to be the simplest thing to do. For all people that don’t know/care about the version, we could have both json_schema
and json_schema_with_spec
or something.
I’d like the ability to also output the data to yaml format if possible, it would be useful at least for my use case I think. But the more I think about it, the more I think that I might just want a macro that creates an openapiv3::Schema
from arbitrary structs, so maybe all of this is irrelevant and I should just look into the code that respects serde attributes here.
I think all the specific getters/setters like format()
set_format
would fit better if they were added to extension traits like trait JsonSchema202012
which you’d implement on Schema
. This way consumers that want to use these would need to bring the trait in scope and it would make completion/method list smaller. Also it would allow external crates to implement their own extension traits if they want to, I think.
One question: do you definitely want to use
serde_json::Value
?
If schemars does't use serde_json::Value
, then it would have to define its own type that would match the JSON data structure, which would be practically identical to serde_json::Value
(plus its various trait implementations). I think that the extra simplicity, reduction in API surface area and ease of maintenance outweights the cost of referncing serde_json
for projects that otherwise wouldn't need it.
This does seem to complicate hand-written
JsonSchema
implementations in that one might need to pay closer attention to the generation settings. It might be useful to provide some mechanism that's effectively "here's my output in 2019-09" format, could you please transform this according to what the caller is asking for?"
That mechanism you suggest is almost exactly the same as in the proposal, except that the proposal standardises on 2020-12 instead of 2019-09!
For the ability to produce extra schema outputs, I feel like that having the option to specify in the
json_schema
call which variant to target is going to be the simplest thing to do. For all people that don’t know/care about the version, we could have bothjson_schema
andjson_schema_with_spec
or something.
As above, JsonSchema
implementations that don't know/care about the version can just return a 2020-12 schema. Otherwise, they may check the meta_schema
for a known version.
I’d like the ability to also output the data to yaml format if possible, it would be useful at least for my use case I think.
I believe that's already possible, both in schemars 0.8 and with the proposed change. As long as Schema
implements serde::Serialize
, you can serialize it to YAML using serde_yaml.
But the more I think about it, the more I think that I might just want a macro that creates an
openapiv3::Schema
from arbitrary structs
Another handy thing you would be able to do with this proposal is convert a Schema
into any compatible struct using something like serde_json::from_value
, e.g.
let s1: schemars::Schema = schema_for!(SomeStruct);
let s2: openapiv3::Schema = serde_json::from_value(s1.into()).unwrap();
This proposal has now been implemented on the v1 branch, and released to crates.io as 1.0.0-alpha.1 - please try it out and let me know what you think!
Disclaimer: I haven't used schemars
yet as I am waiting for version 1.0.0 to make JsonScheme
a required trait in the public API of one of our libraries; in other words, my comment is likely to reflect my lack of experience with schemars
, but here it goes...
First off, schemars
looks like an impressive and incredibly useful library, I am cheering for 1.0.0!
Is my understanding correct that with this change, schemars
would effectively become independent of the meta-scheme and might be one day able to output other schemes (e.g. TypeSchema, CDDL, etc.) without change to the API, provided of course that these can be serialized with serde
?
If true, then maybe 1.0.0 might be an opportunity to change some API names to more generic versions, such as Schema
instead of JsonSchema
, possibly keeping the old names as aliases to provide a smooth upgrade path?
Is my understanding correct that with this change, schemars would effectively become independent of the meta-scheme and might be one day able to output other schemes (e.g. TypeSchema, CDDL, etc.) without change to the API, provided of course that these can be serialized with serde?
I have no plans to widen the scope of the project to support any schemas beyond JSON Schema. The only way that would be possible with schemars would be to use schemars to generate a JSON schema, and then convert that JSON schema to the desired schema type.
Closing as this is implemented in 1.0.0-alpha.1 (and beyond)