jsonschema-rs
jsonschema-rs copied to clipboard
Improve validators graph
The core idea behind this library is to store all keywords as a tree similar to the original schema JSON object. However, the implementation is not as efficient as I'd like it to be. Primarily, it has the following disadvantages:
- Each node's siblings are stored in a separate vector, which leads to a very fragmented graph that negatively affects cache locality;
- It doesn't allow cycles. The primary use case is the
$refkeyword, where some compilation logic is happening during validation. These things are cached, though, but the cost ofRWLockis visible in flame graphs. Also, when logically the same validator is called recursively -> some nodes are re-created, but theoretically, they can be re-used.
This issue is the starting point to rethink the implementation and maybe to implement something better.
Here are some thoughts I have in mind:
- It might be possible to use an arena to allocate all boxed validators, then navigating could be done with indexes (maybe);
- An interesting article on using indexes in graphs;
- It might be nice to have some tool to display the validators graph.
- More ideas
Later I'll draw the current implementation, so all the overhead is more visible.
At the moment, I think that the following layout will be a good step in this direction:
- Store validators in a vector
- Store metadata (relative_path / absolute_path) in a separate vector
- Pass
&[BoxedValidator]to each validator inis_valid/validate/applyso each validator can call dependent validators - Use indexes to access dependent validators/metadata (as a single one or a range)
This should improve cache locality as all validators will be in the same vector + remove costs for following references between the validation nodes (it will be array access by index) + is_valid won't have extra costs for loading larger SchemaNode as it will be just boxed validator
As the next step, it will be easier to evaluate $ref during the compilation phase