refactor: Use index vec to store the schema nodes
This is yet another attempt to improve the schema tree layout without going fully into building a compiler & interpreter, as noted in #641 (which will require way more effort).
The core goal is to prevent excessive memory consumption by recursive references, which are evaluated lazily and keep expanding on each deeper recursion cycle (#675).
Solving the above issue should also remove the need to store Registry inside Validator, and, in principle, it should be possible to pass Registry by reference. It will unblock a better API for jsonschema where the user won't need to pass the Registry by value (likely cloning it).
Inspired by changes I made in the css-inline crate a few years ago.
Codecov Report
:x: Patch coverage is 93.38731% with 49 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 86.14%. Comparing base (44520ad) to head (5e3a3bf).
:warning: Report is 93 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #749 +/- ##
==========================================
+ Coverage 85.71% 86.14% +0.43%
==========================================
Files 95 100 +5
Lines 15531 16362 +831
==========================================
+ Hits 13312 14095 +783
- Misses 2219 2267 +48
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
CodSpeed Performance Report
Merging #749 will not alter performance
Comparing dd/flat (5e3a3bf) with master (958b430)
Summary
โ
47 untouched benchmarks
๐ 1 new benchmarks
Benchmarks breakdown
| Benchmark | BASE |
HEAD |
Change | |
|---|---|---|---|---|
| ๐ | new_is_valid |
N/A | 5.2 ยตs | N/A |
I was looking a bit more into the memory usage of this library, and I have an additional sidenote that I'll leave here.
Other than the well-known lazy-statics, it looks like there are some larger memory leak coming from the $ref validator here:
https://github.com/Stranger6667/jsonschema/blob/958b430eedc2a6c96810bba2a8b3bad08fa61217/crates/jsonschema/src/keywords/ref_.rs#L124-L142
I think that the current recursive-validators manage to build a cyclic family of Arc (or Rc?) via the intertwined Context, Registry, Resolver and the validators.
I naively tried to break this by demoting LazyRefValidator.registry to a Weak, but that didn't work at runtime.
@lucab
Thank you for taking a deep look into this! The presence of registry, ctx, etc, is surely an awful hack. In this PR, the static references are resolved at the compilation step, so there is no need to keep those structures.
I am not sure about $dynamicRef 100% yet, but it should also be possible to avoid this "compilation at runtime" approach too. I.e., store all possible jump locations and decide on the exact one at runtime.
Right now, static refs work in the new compiler, so adding $dynamicRef is the next step on my TODO list.
As a side note, another factor contributing to memory usage could be the presence of many separate allocations (every node is boxed). I believe that, depending on the used allocator, memory fragmentation may also impact the final result.
Gradually, I want to pack things more (e.g. intern String for properties, etc), and I think it will also improve the situation with runtime performance & memory usage.
Reworked the recursive references recently, so the memory consumption issue is solved. I'll continue to make more improvements to the compiler later on, separately.