jsonschema-rs icon indicating copy to clipboard operation
jsonschema-rs copied to clipboard

Allow custom meta schemas

Open MYDIH opened this issue 11 months ago • 8 comments

Hi !

It seems specifying a custom specification ($schema field) isn't working properly when guessed automatically. In other libraries (like boon for example) the specification is resolved against the already known schemas.

In the end, the $schemas is just a reference to another schema, and it should be retrieved like any other $ref, what do you think ?

I guess the error lies in Resource struct, that is only able to resolve specification against known Drafts.

I can elaborate more on demand, providing an example,

Thanks !

MYDIH avatar Dec 31 '24 09:12 MYDIH

Hi @MYDIH

I'd love to fix it! Can you please provide an example, so I can dig into it and build a failing test case?

Stranger6667 avatar Dec 31 '24 09:12 Stranger6667

Hi,

Let me say that was a very quick response ! Thanks a lot :smile:

Here is an example that should show the issue

use jsonschema::Resource;
use serde_json::json;

#[test]
fn test_library() {
    let schemas = vec![
        (
            "http://example.com/meta/schema".to_string(),
            Resource::from_contents(json!(
                {
                    "$id": "http://example.com/meta/schema",
                    "$schema": "https://json-schema.org/draft/2020-12/schema",
                    "title": "Core schema definition",
                    "type": "object",
                    "allOf": [
                        {
                            "$ref": "#/$defs/editable"
                        },
                        {
                            "$ref": "#/$defs/core"
                        }
                    ],
                    "properties": {
                        "properties": {
                            "type": "object",
                            "patternProperties": {
                                ".*": {
                                    "type": "object",
                                    "properties": {
                                        "type": {
                                            "type": "string",
                                            "enum": [
                                                "array",
                                                "boolean",
                                                "integer",
                                                "number",
                                                "object",
                                                "string",
                                                "null"
                                            ]
                                        }
                                    }
                                }
                            },
                            "propertyNames": {
                                "type": "string",
                                "pattern": "^[A-Za-z_][A-Za-z0-9_]*$"
                            }
                        }
                    },
                    "unevaluatedProperties": false,
                    "required": [
                        "properties"
                    ],
                    "$defs": {
                        "core": {
                            "type": "object",
                            "properties": {
                                "$id": {
                                    "type": "string"
                                },
                                "$schema": {
                                    "type": "string"
                                },
                                "type": {
                                    "const": "object"
                                },
                                "title": {
                                    "type": "string"
                                },
                                "description": {
                                    "type": "string"
                                },
                                "additionalProperties": {
                                    "type": "boolean",
                                    "const": false
                                }
                            },
                            "required": [
                                "$id",
                                "$schema",
                                "type"
                            ]
                        },
                        "editable": {
                            "type": "object",
                            "properties": {
                                "creationDate": {
                                    "type": "string",
                                    "format": "date-time"
                                },
                                "updateDate": {
                                    "type": "string",
                                    "format": "date-time"
                                }
                            },
                            "required": [
                                "creationDate"
                            ]
                        }
                    }
                }
            ))
            .unwrap(),
        ),
        (
            "http://example.com/schemas/element".to_string(),
            Resource::from_contents(json!(
                {
                    "$schema": "http://example.com/meta/schema",
                    "$id": "http://example.com/schemas/element",
                    "title": "Element",
                    "description": "An element",
                    "creationDate": "2024-12-31T12:31:53+01:00",
                    "properties": {
                        "value": {
                            "type": "string",
                            "title": "Value",
                            "maxLength": 450
                        }
                    },
                    "required": [],
                    "type": "object"
                }
            ))
            .unwrap(),
        ),
    ];

    jsonschema::options()
        .with_resources(schemas.into_iter())
        .build(&json!({
            "$schema": "http://example.com/schemas/element",
            "value": "ded"
        }))
        .unwrap();
}

Thanks again for your time

MYDIH avatar Dec 31 '24 13:12 MYDIH

Thanks!

I can't get it working with boon:

use boon::{Compiler, Schemas};
use serde_json::json;

fn main() {
    let mut schemas = Schemas::new();
    let mut compiler = Compiler::new();
    let _meta_index = compiler.compile("meta.json", &mut schemas).unwrap();
    let schema_index = compiler.compile("schema.json", &mut schemas).unwrap();
    let instance = json!("foo");
    let valid = schemas.validate(&instance, schema_index).is_ok();
}

Fails with:

thread 'main' panicked at src/main.rs:8:70:
called `Result::unwrap()` on an `Err` value: LoadUrlError { url: "http://example.com/meta/schema", src: UnsupportedUrlScheme { url: "http://example.com/meta/schema" } }

Where meta.json & schema.json contain the resources from your example. Am I missing something? Unfortunately I am not super familiar with boon's API.

In the end, the $schemas is just a reference to another schema, and it should be retrieved like any other $ref, what do you think ?

I guess the error lies in Resource struct, that is only able to resolve specification against known Drafts.

I agree, and probably the Resource struct should not fail on unknown $schema but rather keep its value in a separate Draft variant which will effectively mean "probably a custom schema". Then it will be validated during the registry initialization when all resources are in place and that $schema can be resolved to some a meta-schema.

However, it is a bit unclear how Resource::id_of, and other methods should behave in such a case - should they inherit the behavior of metaschema's $schema or if it is applicable to them at all and they should return an error, None / empty iterator (depending on the method).

Stranger6667 avatar Jan 02 '25 19:01 Stranger6667

Hi !

I think the error is that boon doesn't register the http scheme automatically in the UrlLoader (a Retriever in this library). The way you added the schemas to it is wrong though, even though I also would have expected it to work.

You need to register the schemas beforehand, so they are readily available at the compilation stage. Here is a new example:

use boon::{Compiler, Schemas};
use serde_json::json;

#[test]
fn test_library_boon() {
    let schemas = vec![
        (
            "http://example.com/meta/schema".to_string(),
            json!(
                {
                    "$id": "http://example.com/meta/schema",
                    "$schema": "https://json-schema.org/draft/2020-12/schema",
                    "title": "Core schema definition",
                    "type": "object",
                    "allOf": [
                        {
                            "$ref": "#/$defs/editable"
                        },
                        {
                            "$ref": "#/$defs/core"
                        }
                    ],
                    "properties": {
                        "properties": {
                            "type": "object",
                            "patternProperties": {
                                ".*": {
                                    "type": "object",
                                    "properties": {
                                        "type": {
                                            "type": "string",
                                            "enum": [
                                                "array",
                                                "boolean",
                                                "integer",
                                                "number",
                                                "object",
                                                "string",
                                                "null"
                                            ]
                                        }
                                    }
                                }
                            },
                            "propertyNames": {
                                "type": "string",
                                "pattern": "^[A-Za-z_][A-Za-z0-9_]*$"
                            }
                        }
                    },
                    "unevaluatedProperties": false,
                    "required": [
                        "properties"
                    ],
                    "$defs": {
                        "core": {
                            "type": "object",
                            "properties": {
                                "$id": {
                                    "type": "string"
                                },
                                "$schema": {
                                    "type": "string"
                                },
                                "type": {
                                    "const": "object"
                                },
                                "title": {
                                    "type": "string"
                                },
                                "description": {
                                    "type": "string"
                                },
                                "additionalProperties": {
                                    "type": "boolean",
                                    "const": false
                                }
                            },
                            "required": [
                                "$id",
                                "$schema",
                                "type"
                            ]
                        },
                        "editable": {
                            "type": "object",
                            "properties": {
                                "creationDate": {
                                    "type": "string",
                                    "format": "date-time"
                                },
                                "updateDate": {
                                    "type": "string",
                                    "format": "date-time"
                                }
                            },
                            "required": [
                                "creationDate"
                            ]
                        }
                    }
                }
            ),
        ),
        (
            "http://example.com/schemas/element".to_string(),
            json!(
                {
                    "$schema": "http://example.com/meta/schema",
                    "$id": "http://example.com/schemas/element",
                    "title": "Element",
                    "description": "An element",
                    "creationDate": "2024-12-31T12:31:53+01:00",
                    "properties": {
                        "value": {
                            "type": "string",
                            "title": "Value",
                            "maxLength": 450
                        }
                    },
                    "type": "object"
                }
            ),
        ),
    ];

    let mut compiled = Schemas::new();
    let mut compiler = Compiler::new();

    for (id, schema) in &schemas {
        compiler.add_resource(&id, schema.clone()).unwrap();
    }

    let meta_index = compiler
        .compile("http://example.com/meta/schema", &mut compiled)
        .unwrap();

    assert!(compiled.validate(&schemas[1].1, meta_index).is_ok());

    let schema_index = compiler
        .compile("http://example.com/schemas/element", &mut compiled)
        .unwrap();
    let instance = json!({ "value": "foo" });

    assert!(compiled.validate(&instance, schema_index).is_ok());
}

(I had to remove the empty required array from the non-meta schema for it to work. You also could use files I guess, I never did though and it feels weird to me that you should refer to the file path first, and the $id of the schema later ... You could also change the ids to be file:// URIs, I guess it would work)

As a side note, it's also working in Ajv and JsonSchema.Net

I agree, and probably the Resource struct should not fail on unknown $schema but rather keep its value in a separate Draft variant which will effectively mean "probably a custom schema". Then it will be validated during the registry initialization when all resources are in place and that $schema can be resolved to some a meta-schema

Seems fine 😉

However, it is a bit unclear how Resource::id_of, and other methods should behave in such a case - should they inherit the behavior of metaschema's $schema or if it is applicable to them at all and they should return an error, None / empty iterator (depending on the method).

I guess you meant Draft::id_of and the like, I'm not knoledgable enough in this API to be sure, but I guess it should return that new Custom draft you were talking about above. To me a Draft is no more than another schema that is always baked in the tools for convenience, but seeing the code I may be wrong.

Anyway I would expect my shema to correctly report their Draft/meta-schema, but I'm fine if the tool doesn't automatically validate my schema against it. I think boon do not for example, and validating the schema against it's meta schema is a manual step

MYDIH avatar Jan 03 '25 08:01 MYDIH

I think the issue here isn't that jsonschema-rs doesn't allow custom meta schemas but that it only checks the retriever and not any resources and it's not recursive (by design?). https://github.com/Stranger6667/jsonschema/blob/4919321acfa858cacf404203371b1519b1ab4c80/crates/jsonschema/src/options.rs#L509-L533

Side note: Is there any difference between adding resources or falling back to a hashmap in your retriever? Is it better to use retrievers because you don't have to clone jsonschema::Resource every time you build a validator? And is the registry somehow meant to be used by a jsonschema-rs user or is it more of an internal implementation detail?

axelkar avatar May 05 '25 21:05 axelkar

After having investigated a strange error further, seems that after even the feature described above isn't used consistently. E.g.

$id: http://devicetree.org/schemas/example-schema.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
# ^ This works because of the explicit code in ValidatorOptions::draft_for
properties:
  vendor,int-property:
    $ref: /schemas/types.yaml#/definitions/uint32

https://github.com/Stranger6667/jsonschema/blob/4919321acfa858cacf404203371b1519b1ab4c80/crates/jsonschema/src/compiler.rs#L299

$id: http://devicetree.org/schemas/types.yaml#
$schema: http://devicetree.org/meta-schemas/base.yaml#
# Got error `Unknown specification: http://devicetree.org/meta-schemas/base.yaml` after retriever logged this file
# ValidatorOptions::draft_for was never called for this.

https://github.com/Stranger6667/jsonschema/blob/4919321acfa858cacf404203371b1519b1ab4c80/crates/jsonschema-referencing/src/registry.rs#L715

How do you think this could be resolved?

axelkar avatar May 05 '25 23:05 axelkar

Migrated to boon. It fully supports custom meta schemas and its schema compiled collection seems more efficient. It "just works" by adding a loader/retriever. Thanks anyways for this library.

axelkar avatar May 14 '25 18:05 axelkar

hey @axelkar!

I appreciate you reporting the details here, even though I didn't have the capacity to implement custom meta schema support (yet). I will work on this eventually, and all the details here would be helpful for me or anyone who would be willing to contribute :) Thank you for your time and effort

As a side note for completeness (not sure if my answers are relevant anymore)

Is there any difference between adding resources or falling back to a hashmap in your retriever? Is it better to use retrievers because you don't have to clone jsonschema::Resource every time you build a validator?

Resources are cached inside Registry - you could remove values from that hashmap inside your retriever, and it will be more or less equivalent to passing resources explicitly.

And is the registry somehow meant to be used by a jsonschema-rs user or is it more of an internal implementation detail?

The main idea behind Registry is to reuse resources without the need to clone them every time for every new validator; however, the current implementation expects an owned registry, and it is an implementation problem (registry is needed in the $ref-related keywords now). Ideally, it should be passed by reference.

Stranger6667 avatar May 14 '25 20:05 Stranger6667

Will be available in 0.35

Stranger6667 avatar Nov 15 '25 21:11 Stranger6667