tskit icon indicating copy to clipboard operation
tskit copied to clipboard

MetadataSchema doesn't validate the schema

Open jeromekelleher opened this issue 3 years ago • 3 comments
trafficstars

There doesn't seem to be much validation of the schema itself when we set a metadata schema. E.g., if we do:

import tskit

tables = tskit.TableCollection(1)
tables.nodes.metadata_schema = tskit.MetadataSchema(
    {
        "codec": "json",
        "type": "object",
        "properties": {"accession_number": {"type": "integer"}},
        "additional Properties": True,
    }
)

print(tables.nodes.metadata_schema)

tables.nodes.add_row(metadata={})
OrderedDict([('additional Properties', True),
             ('codec', 'json'),
             ('properties',
              OrderedDict([('accession_number',
                            OrderedDict([('type', 'integer')]))])),
             ('type', 'object')])

(Note the space in "additionalProperties")

This is pretty unhelpful from a developer perspective - we should at least give an error if the schema contains stuff we don't understand.

jeromekelleher avatar Jan 20 '22 12:01 jeromekelleher

The schema is actually validated:

>>> ms = tskit.MetadataSchema({"uniqueItems":"foo"})
Traceback (most recent call last):
  File "/home/benj/projects/tskit/python/tskit/metadata.py", line 622, in __init__
    TSKITMetadataSchemaValidator.check_schema(schema)
  File "/home/benj/projects/tskit/env/lib/python3.10/site-packages/jsonschema/validators.py", line 294, in check_schema
    raise exceptions.SchemaError.create_from(error)
jsonschema.exceptions.SchemaError: 'foo' is not of type 'boolean'

Failed validating 'type' in metaschema['properties']['uniqueItems']:
    {'default': False, 'type': 'boolean'}

On schema['uniqueItems']:
    'foo'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/benj/projects/tskit/python/tskit/metadata.py", line 624, in __init__
    raise exceptions.MetadataSchemaValidationError(str(ve)) from ve
tskit.exceptions.MetadataSchemaValidationError: 'foo' is not of type 'boolean'

Failed validating 'type' in metaschema['properties']['uniqueItems']:
    {'default': False, 'type': 'boolean'}

On schema['uniqueItems']:
    'foo'

However by default the metaschema is permissive so your example will pass as "additional Properties" will be seen as, well, an additional property.

benjeffery avatar Jan 26 '22 17:01 benjeffery

I see. Is there something we should do to tighten this up, or would it lead to the schemas being too restrictive?

jeromekelleher avatar Jan 27 '22 10:01 jeromekelleher

I see. Is there something we should do to tighten this up, or would it lead to the schemas being too restrictive?

We can modify the metaschema, yes, to prevent additional properties. This would error on typos at the cost of not allowing arbitrary information in the schema. I think, given our use case that this might be a good trade off to make.

benjeffery avatar Jan 27 '22 11:01 benjeffery