tskit
tskit copied to clipboard
MetadataSchema doesn't validate the schema
There doesn't seem to be much validation of the schema itself when we set a metadata schema. E.g., if we do:
import tskit
tables = tskit.TableCollection(1)
tables.nodes.metadata_schema = tskit.MetadataSchema(
{
"codec": "json",
"type": "object",
"properties": {"accession_number": {"type": "integer"}},
"additional Properties": True,
}
)
print(tables.nodes.metadata_schema)
tables.nodes.add_row(metadata={})
OrderedDict([('additional Properties', True),
('codec', 'json'),
('properties',
OrderedDict([('accession_number',
OrderedDict([('type', 'integer')]))])),
('type', 'object')])
(Note the space in "additionalProperties")
This is pretty unhelpful from a developer perspective - we should at least give an error if the schema contains stuff we don't understand.
The schema is actually validated:
>>> ms = tskit.MetadataSchema({"uniqueItems":"foo"})
Traceback (most recent call last):
File "/home/benj/projects/tskit/python/tskit/metadata.py", line 622, in __init__
TSKITMetadataSchemaValidator.check_schema(schema)
File "/home/benj/projects/tskit/env/lib/python3.10/site-packages/jsonschema/validators.py", line 294, in check_schema
raise exceptions.SchemaError.create_from(error)
jsonschema.exceptions.SchemaError: 'foo' is not of type 'boolean'
Failed validating 'type' in metaschema['properties']['uniqueItems']:
{'default': False, 'type': 'boolean'}
On schema['uniqueItems']:
'foo'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/benj/projects/tskit/python/tskit/metadata.py", line 624, in __init__
raise exceptions.MetadataSchemaValidationError(str(ve)) from ve
tskit.exceptions.MetadataSchemaValidationError: 'foo' is not of type 'boolean'
Failed validating 'type' in metaschema['properties']['uniqueItems']:
{'default': False, 'type': 'boolean'}
On schema['uniqueItems']:
'foo'
However by default the metaschema is permissive so your example will pass as "additional Properties" will be seen as, well, an additional property.
I see. Is there something we should do to tighten this up, or would it lead to the schemas being too restrictive?
I see. Is there something we should do to tighten this up, or would it lead to the schemas being too restrictive?
We can modify the metaschema, yes, to prevent additional properties. This would error on typos at the cost of not allowing arbitrary information in the schema. I think, given our use case that this might be a good trade off to make.