jsonschema
jsonschema copied to clipboard
Is `jsonschema.validate` slow?
I come to you from the tracker-commons open-source project.
Using the following 57 MB JSON file and 7 KB schema file, the following code takes upwards of 70 seconds to run.
import json, jsonschema
with open("wcon_schema.json", "r") as wcon_schema_file:
schema = json.loads(wcon_schema_file.read())
with open("testfile_new.wcon", 'r') as infile:
serialized_data = infile.read()
w = json.loads(serialized_data)
jsonschema.validate(w, schema)
This dwarfs all the other processing steps I'm performing on it by a factor of 100. Is there something I'm missing? Is it really supposed to take this long to validate?
Thanks.
Hi! Thanks for the report. Haven't gotten a chance to dig in yet, but as a general principle, are you using CPython? If you are, the answer is probably "yes very likely" :). But I'll have a look.
I also noticed that validation of large files can be time consuming. I am seeing 10-40 seconds (depending on the complexity of the schema) required to validate a json file with 425,384 entries. Using version 2.5.1 distributed on pypi.
For comparison, the JavaScript library ajv validates the same file in under 3 seconds.
https://github.com/epoberezkin/ajv
Perhaps someone knows of a Python library that wraps ajv?
@MichaelCurrie what was the answer to what implementation of Python you are using?
Have you tried profiling validation and seeing what's taking so long? I'd definitely accept performance patches that preserve backwards compatibility (and do not make things slower on PyPy).
Also see #232 -- I'd love to have actual benchmarks added, it's the only real way to ensure that performance regressions don't happen, at least for the benchmarks in the benchmark suite.
I am using the CPython implementation of Python:
Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> platform.python_implementation()
'CPython'
@Julian do you have an example of how to profile the code in a way that would provide you with useful information? Alternatively, I have already provided my example files and code, so you could also do this profiling if you are interested. I suspect any performance improvements gleaned would have general applicability.
Thanks for your help.
FYI, #158 was an earlier attempt to improve speed.