Memory not being freed after validation
After performing seemingly any type of json schema validation, it seems that memory is not being freed. For example, running this version of the example code with the cap crate to print memory usage:
use serde_json::json;
use std::alloc;
use cap::Cap;
#[global_allocator]
static ALLOCATOR: Cap<alloc::System> = Cap::new(alloc::System, usize::max_value());
fn main() -> Result<(), Box<dyn std::error::Error>> {
println!("Allocated before validation: {}KB", ALLOCATOR.allocated()/1024);
{
let schema = json!({"maxLength":5});
let instance = json!("foo");
let validator = jsonschema::validator_for(&schema)?;
assert!(validator.validate(&instance).is_ok());
}
println!("Allocated after validation: {}KB", ALLOCATOR.allocated()/1024);
Ok(())
}
results in the following being printed
Allocated before validation: 0KB
Allocated after validation: 4184KB
Using the jsonschema::is_valid approach instead results in the same behavior:
println!("Allocated before validation: {}KB", ALLOCATOR.allocated()/1024);
{
let schema = json!({"maxLength":5});
let instance = json!("foo");
assert!(jsonschema::is_valid(&schema, &instance));
assert!(jsonschema::validate(&schema, &instance).is_ok());
}
println!("Allocated after validation: {}KB", ALLOCATOR.allocated()/1024);
Allocated before validation: 0KB
Allocated after validation: 4184KB
Creating a more complex schema seems to increase the memory held after the validation scope:
println!("Allocated before validation: {}KB", ALLOCATOR.allocated()/1024);
{
let schema = json!({"type":"object",
"properties": {
"innerThing" : {
"type" :"string",
"maxLength": 5,
"minLength": 1
},
"anotherThing" : {
"type":"string",
"maxLength":10,
"minLength": 1
},
"arrayThing" : {
"type": "array",
"items": {
"type" : "string"
}
}
},
"required": ["innerThing"]
});
let instance = json!("foo");
_ = jsonschema::is_valid(&schema, &instance);
_ = jsonschema::validate(&schema, &instance).is_ok();
}
println!("Allocated after validation: {}KB", ALLOCATOR.allocated()/1024);
Allocated before validation: 0KB
Allocated after validation: 11355KB
It seems to me that the memory should be released after the program exits the scope in which jsonschema is used, but it does not. Is there an issue with how I'm doing validation here? Or is there a problem with jsonschema itself?
Thanks for opening!
There are a few Lazy statics that are evaluated upon first access; for example, meta schemas are needed to validate the input schemas. However, I'd think that the size of meta-schemas is always the same, maybe some other cache (like for patterns, but they are not present).
In any event, I'll take a look if there is anything which is not cleaned or not capped at least
I think you are right. I am inclined to think that the problem is with how the meta schema registry is used during the compilation process. It is cloned and resources are merged together I assume that some of the data is not properly dropped.
For my own reference later:
- meta schemas also have
SPECIFICATION.clone()but they rather should have their specific set of resources instead. - currently, the size of the input schema influences the total size of the allocation that is left at the end of the block which makes me think that this object is stored somewhere globally
Also - https://github.com/Stranger6667/jsonschema/actions/runs/13093817151/job/36533592518
After some investigation, I realized that the behavior you observed happens because of how $ref and similar keywords are implemented in jsonschema. Right now they are lazy because it is the simplest way to handle deeply recursive schemas and for this reason, are only resolved (and cached inside the validator) on access to the specific keyword. The largest portion of memory growth happens in meta schemas that are used to validate input schemas and grow with new keywords being validated (+ new levels of nesting are also affecting it).
There are also some small parts like seed values for ahash, but it is around 88 bytes on the first access.
This approach is clearly far from optimal and right now I plan to reduce memory usage with #686 and one extra PR after it. A better way is to implement a proper virtual machine (#641, right now jsonschema uses a tree-walk approach to validation) which will resolve cycles and avoid the need for lazy evaluation.
With the recent rework of the compilation pipeline, this issue is resolved. No more storing ValidationOption / Registry / etc inside the validator. For this reason, recursive references don't do any compilation in validation-time, and the validator does not grow anymore.
Now only lazy statics are there (e.g. metaschemas & ahash random seeds).