jsonschema icon indicating copy to clipboard operation
jsonschema copied to clipboard

Is `jsonschema.validate` slow?

Open MichaelCurrie opened this issue 9 years ago • 6 comments

I come to you from the tracker-commons open-source project.

Using the following 57 MB JSON file and 7 KB schema file, the following code takes upwards of 70 seconds to run.

import json, jsonschema

with open("wcon_schema.json", "r") as wcon_schema_file:
    schema = json.loads(wcon_schema_file.read())

with open("testfile_new.wcon", 'r') as infile:
    serialized_data = infile.read()

w = json.loads(serialized_data)

jsonschema.validate(w, schema)

This dwarfs all the other processing steps I'm performing on it by a factor of 100. Is there something I'm missing? Is it really supposed to take this long to validate?

Thanks.

MichaelCurrie avatar Mar 09 '16 01:03 MichaelCurrie

Hi! Thanks for the report. Haven't gotten a chance to dig in yet, but as a general principle, are you using CPython? If you are, the answer is probably "yes very likely" :). But I'll have a look.

Julian avatar Mar 21 '16 12:03 Julian

I also noticed that validation of large files can be time consuming. I am seeing 10-40 seconds (depending on the complexity of the schema) required to validate a json file with 425,384 entries. Using version 2.5.1 distributed on pypi.

ccoffrin avatar Jun 14 '16 18:06 ccoffrin

For comparison, the JavaScript library ajv validates the same file in under 3 seconds.

https://github.com/epoberezkin/ajv

Perhaps someone knows of a Python library that wraps ajv?

MichaelCurrie avatar Jun 17 '17 20:06 MichaelCurrie

@MichaelCurrie what was the answer to what implementation of Python you are using?

Have you tried profiling validation and seeing what's taking so long? I'd definitely accept performance patches that preserve backwards compatibility (and do not make things slower on PyPy).

Also see #232 -- I'd love to have actual benchmarks added, it's the only real way to ensure that performance regressions don't happen, at least for the benchmarks in the benchmark suite.

Julian avatar Jun 17 '17 21:06 Julian

I am using the CPython implementation of Python:

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> platform.python_implementation()
'CPython'

@Julian do you have an example of how to profile the code in a way that would provide you with useful information? Alternatively, I have already provided my example files and code, so you could also do this profiling if you are interested. I suspect any performance improvements gleaned would have general applicability.

Thanks for your help.

MichaelCurrie avatar Jun 17 '17 21:06 MichaelCurrie

FYI, #158 was an earlier attempt to improve speed.

ankostis avatar Jun 16 '19 17:06 ankostis