tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Cache metadata validation

Open jeromekelleher opened this issue 4 years ago • 3 comments

As shown in https://github.com/tskit-dev/msprime/discussions/1901 metadata validation is proving to be quite expensive for msprime simulations. One option would be to change our code to use https://github.com/horejsek/python-fastjsonschema

It has conda-forge and pypi packages and has no dependencies, so is definitely plausible. It's BSD licensed, so all good there too.

jeromekelleher avatar Nov 11 '21 13:11 jeromekelleher

For completeness, there's also jsonschema-rs which looks quite fast. It's also permissively licensed. There's a pypi package (with no deps), but no conda package.

grahamgower avatar Nov 11 '21 13:11 grahamgower

This is probably possible, but likely not a drop-in replacement as we do some customisation to cope with things like allowing None at the top level. See https://github.com/tskit-dev/tskit/blob/main/python/tskit/metadata.py#L61

One quick win would be to cache the validation in the same way we do for encoding.

benjeffery avatar Nov 11 '21 14:11 benjeffery

I've changed the title here as we'll get validation caching in for 0.4.1

benjeffery avatar Dec 16 '21 13:12 benjeffery