jsonschema-rs icon indicating copy to clipboard operation
jsonschema-rs copied to clipboard

refactor: Use index vec to store the schema nodes

Open Stranger6667 opened this issue 7 months ago โ€ข 2 comments

This is yet another attempt to improve the schema tree layout without going fully into building a compiler & interpreter, as noted in #641 (which will require way more effort).

The core goal is to prevent excessive memory consumption by recursive references, which are evaluated lazily and keep expanding on each deeper recursion cycle (#675).

Solving the above issue should also remove the need to store Registry inside Validator, and, in principle, it should be possible to pass Registry by reference. It will unblock a better API for jsonschema where the user won't need to pass the Registry by value (likely cloning it).

Inspired by changes I made in the css-inline crate a few years ago.

Stranger6667 avatar May 22 '25 11:05 Stranger6667

Codecov Report

:x: Patch coverage is 93.38731% with 49 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 86.14%. Comparing base (44520ad) to head (5e3a3bf). :warning: Report is 93 commits behind head on master.

Files with missing lines Patch % Lines
crates/jsonschema/src/compiler_v2.rs 91.45% 24 Missing :warning:
crates/jsonschema/src/ir/number.rs 47.22% 19 Missing :warning:
crates/jsonschema/src/ir/mod.rs 98.87% 4 Missing :warning:
crates/jsonschema-referencing/src/registry.rs 0.00% 1 Missing :warning:
crates/jsonschema/src/ir/display.rs 98.18% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #749      +/-   ##
==========================================
+ Coverage   85.71%   86.14%   +0.43%     
==========================================
  Files          95      100       +5     
  Lines       15531    16362     +831     
==========================================
+ Hits        13312    14095     +783     
- Misses       2219     2267      +48     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 22 '25 11:05 codecov[bot]

CodSpeed Performance Report

Merging #749 will not alter performance

Comparing dd/flat (5e3a3bf) with master (958b430)

Summary

โœ… 47 untouched benchmarks
๐Ÿ†• 1 new benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
๐Ÿ†• new_is_valid N/A 5.2 ยตs N/A

codspeed-hq[bot] avatar May 22 '25 11:05 codspeed-hq[bot]

I was looking a bit more into the memory usage of this library, and I have an additional sidenote that I'll leave here. Other than the well-known lazy-statics, it looks like there are some larger memory leak coming from the $ref validator here: https://github.com/Stranger6667/jsonschema/blob/958b430eedc2a6c96810bba2a8b3bad08fa61217/crates/jsonschema/src/keywords/ref_.rs#L124-L142

I think that the current recursive-validators manage to build a cyclic family of Arc (or Rc?) via the intertwined Context, Registry, Resolver and the validators. I naively tried to break this by demoting LazyRefValidator.registry to a Weak, but that didn't work at runtime.

lucab avatar Jun 14 '25 11:06 lucab

@lucab

Thank you for taking a deep look into this! The presence of registry, ctx, etc, is surely an awful hack. In this PR, the static references are resolved at the compilation step, so there is no need to keep those structures.

I am not sure about $dynamicRef 100% yet, but it should also be possible to avoid this "compilation at runtime" approach too. I.e., store all possible jump locations and decide on the exact one at runtime.

Right now, static refs work in the new compiler, so adding $dynamicRef is the next step on my TODO list.

As a side note, another factor contributing to memory usage could be the presence of many separate allocations (every node is boxed). I believe that, depending on the used allocator, memory fragmentation may also impact the final result.

Gradually, I want to pack things more (e.g. intern String for properties, etc), and I think it will also improve the situation with runtime performance & memory usage.

Stranger6667 avatar Jun 14 '25 12:06 Stranger6667

Reworked the recursive references recently, so the memory consumption issue is solved. I'll continue to make more improvements to the compiler later on, separately.

Stranger6667 avatar Nov 16 '25 12:11 Stranger6667