tskit
tskit copied to clipboard
Cache metadata validation
As shown in https://github.com/tskit-dev/msprime/discussions/1901 metadata validation is proving to be quite expensive for msprime simulations. One option would be to change our code to use https://github.com/horejsek/python-fastjsonschema
It has conda-forge and pypi packages and has no dependencies, so is definitely plausible. It's BSD licensed, so all good there too.
For completeness, there's also jsonschema-rs which looks quite fast. It's also permissively licensed. There's a pypi package (with no deps), but no conda package.
This is probably possible, but likely not a drop-in replacement as we do some customisation to cope with things like allowing None at the top level. See https://github.com/tskit-dev/tskit/blob/main/python/tskit/metadata.py#L61
One quick win would be to cache the validation in the same way we do for encoding.
I've changed the title here as we'll get validation caching in for 0.4.1