telemetry
telemetry copied to clipboard
$id must be valid URI in json schema
Problem
Since JSON Schema Draft 6, $id
is required to be a valid URI reference as defined in RFC3986, section 4.1, which is either a URI (e.g., http://eventlogging.jupyter.org/event-schema
) or a relative reference ( e.g., /event-schema
). Currently, the $id
s used in event schemas across different Jupyter project do not follow this rule:
binderhub.jupyter.org/launch
hub.jupyter.org/server-action
They lack the scheme part of the URI and thus are not valid URIs.
Due to this the schemas are not guaranteed to always work with JSON schema validators. One example is when trying to use $ref
import jsonschema
schema = {
"$id": "hub.jupyter.org/example-schema",
"properties": {
"requester": {"$ref": "#/definitions/user"},
"target_user": {"$ref": "#/definitions/user"}
},
"definitions": {
"user": {
"type": "object",
"properties": {
"name": {"type": "string"},
"id": {"type": "string"}
}
}
}
}
instance = {
"requester": {"name": "a", "id": "1"},
"target_user": {"name": "b", "id": "2"}
}
jsonschema.validate(instance, schema)
This would fail
jsonschema.exceptions.RefResolutionError: unknown url type: 'hub.jupyter.org/hub.jupyter.org/example-schema'
Change to "$id": "http://hub.jupyter.org/example-schema"
or "$id": "/example-schema"
and it validates fine.
There are potentially other undiscovered problems as well.
Proposed solution
Either change $id
to fully formed URI or keep it as a relative reference. The later is what's being used in MediaWiki eventlogging (example)
~~$id
can just start with /
, e.g. /hub/server-event
is a valid $id
. This is also how MediaWiki set the $id
s for their schemas. Example: https://schema.wikimedia.org/repositories/primary/jsonschema/mediawiki/user/blocks-change/current.yaml~~
See first comment.
Thanks for writing this up! Looks like Mediawiki's $id aren't intended to be URIs, so it makes sense they would use the relative path. Since we have a domain component already, let's add a schema and do https://?
MediaWiki's $id
are actual relative URIs though. They resolve against the origin https://schema.wikimedia.org/repositories/primary/jsonschema
. For example https://schema.wikimedia.org/repositories/primary/jsonschema/mediawiki/user/blocks-change/1.1.0 contains the schema with $id
/mediawiki/user/blocks-change/1.1.0
. Relative URIs might provide us with some flexibility down the line.
Since we have a domain component already, let's add a schema and do https://?
I'm ok with this. I do think absolute URIs work for us better since we have multiple domains for different schemas, especially with client events. Thanks!