Performance degradation in version 2.0.0
After updating to version 2.0.0, the performance of validating an admittedly complex schema and large JSON file was significantly reduced and in some cases ran out of memory with 1.5GB allocated.
The schema and JSON file which reproduce the problem are in this zip file:
In version 1.5.9 the json file validates successfully against the schema with no errors in approximately 60 seconds of processing in my development environment.
The code which implements the validation is here: https://github.com/spdx/tools-java/blob/77a41decbe94825424267827180ba738f8cb53cf/src/main/java/org/spdx/tools/Verify.java#L169
I did some light manual sampling to see where the problem might be and most of the time seems to be spent in the startsWith method processing annotation in the Schema.validate(...) method.
Here is a typical stack trace:
The use of unevaluatedProperties requires annotation collection in order to do the evaluation, and this can potentially take up a lot of memory.
In this specific case there is an issue with the properties validation as it is only supposed to process for objects. However it should be quite simple to create test data that will still cause the evaluation to run out of memory.
You might want to consider
- Refactoring your schema to use
additionalProperties: falsewhich entails flattening your$refs - Changing your file format to line delimited JSON and validate record by record instead of a JSON array if processing 150 mb files is normal
Thanks @justin-tay for the quick response and suggestions.
cc: @JPEWdev - things to consider for the schema generation.
Thanks @stevehu