schema2typebox icon indicating copy to clipboard operation
schema2typebox copied to clipboard

support recursive & mutually recursive types

Open SebastienGllmt opened this issue 7 months ago • 10 comments

Previously, if you have any schema that was recursively defined (either simple recursion or mutually recursive), then the code would get stuck in an infinite loop until a Maximum call stack size exceeded error got thrown

Summary

To fix this, I used the (relatively new) Modules feature of Typebox. The benefits of using this feature are:

  1. The code is a lot cleaner than the Type.Recursive typebox feature (hence why things like Type.Recursive are somewhat deprecated and meant to move to the module system). See test/fixture/recursive.ts
  2. It also supports mutually recursive types instead of simple recursion
  3. not a breaking change. The end-result API of people who consume schema2typebox code is the same (same exports)
  4. Sets the ground work for a separate feature allowing to generate typebox for multiple json schemas at once without duplicating code (i.e. they could all just be put into one Typebox module)
  5. It makes it much easier to access inner schemas if you need to refer to their type in typescript land (see test/fixture/multi.ts

SebastienGllmt avatar Apr 06 '25 13:04 SebastienGllmt

Hi @SebastienGllmt, thanks for another PR : ]

Could you please provide me with a sample schema and output that fails in the current setup? If possible, mention your use case/scenario when you stumbled over this for me to better understand this.

Initial thoughts after skimming through the PR:

  1. looks like we should specify typebox as peer dependency instead for all following releases (since only higher versions have that feature available or rather, can make use of it). Which version is the minimal version we can specify for the module syntax? Which would you want to set as minimal peer dependency version and why?

  2. not a breaking change. The end-result API of people who consume schema2typebox code is the same (same exports)

I would argue that it is a "breaking change" since people with older typebox versions will simply not be able to use the generated code anymore. Good chance I might be wrong here, feel free to correct me. However, bumping the "feature" should be fine since we did not have any peer dependency set before anyway..? Thoughts?

  1. Other than that, looking good initially! Please note that I will probably need more time to review this properly and will probably only do this within the next weeks, not days. Feel free to ping me if I forgot this or did not get the time to look at it after appropriate time.

Edit: please also take a look at ci issues (I think first will get solved with correct peer dependency and dev dependency version, second is just formatting missing)

xddq avatar Apr 07 '25 16:04 xddq

Could you please provide me with a sample schema and output that fails in the current setup

Yes, I added one as a test (recursive.json) which you can see here: https://github.com/xddq/schema2typebox/pull/54/files#diff-8180a2313b13c125e09565d35c5e1278867661c88640740c5528107554e85526

Which version is the minimal version we can specify for the module syntax?

You need ^0.34.0 (the latest version of Typebox since this feature was only added last year)

we should specify typebox as peer dependency

makes sense. I've been using schema2typebox via npx, but I agree it makes sense to make the dependency setup work in cases where people want to add schema2typebox in their package.json. I'll look into the CI error in a bit

SebastienGllmt avatar Apr 07 '25 16:04 SebastienGllmt

So this PR will make every exported typebox code (regardless if it contains recursive refs or not) into Module types, correct? What are the drawbacks of this? Will the performance using the generated code for validating be similar/same? Does this somehow use way more memory?

Checking the docu it just mentions Module types as "Module types are containers for a set of referential types" src which to me sounds like this should be used for schemas containing references, but perhaos not for every typebox schema we generate..? Relying heavily on this could end up badly if there will be yet another replacement for Module syntax in typebox later on.

If I understand this correctly, one drawback/change would be that if user try to generate the json schema for the generated typebox code they suddenly end up with a different schema containing #def blocks and #ref blocks instead of "just the schema". I don't know how people use this library, but that could be annoying.

A few options I can think of: a) implement/fix recursion without module syntax b) only use modules if required c) keep as is, release as 2.0 with recursion support and only generating module syntax schemas d) don't add recursion support e) ..?

Honestly I am leaning towards a) or d). Given we really have no major drawbacks when using the module syntax everywhere I would be somewhat down to go with c) as well and merge this into 2.0 after reviewing.

What are you thoughts on this?


just a sample schema for me to better understand this when reading again, dummy schema


{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "CommentWithReplies",
  "title": "Comment",
  "type": "object",
  "properties": {
    "author": {
      "type": "string"
    },
    "message": {
      "type": "string"
    },
    "replies": {
      "type": "array",
      "items": {
        "$ref": "#"
      }
    }
  },
  "required": ["author", "message"]
}

xddq avatar Apr 19 '25 17:04 xddq

So this PR will make every exported typebox code (regardless if it contains recursive refs or not) into Module types, correct?

Internally they are constructed as modules, but the exported type is the same as before (Module.Import extracts the type from the module)

Checking the docs it just mentions Module types as "Module types are containers for a set of referential types" src which to me sounds like this should be used for schemas containing references, but perhaps not for every typebox schema we generate..?

It's useful even for cases with no references to be able to aggregate many json schema files into one module (see #55)

Relying heavily on this could end up badly if there will be yet another replacement for Module syntax in typebox later on.

The code difference in this PR is not that big, so I don't think switching to an alternative would be a lot of work either

If I understand this correctly, one drawback/change would be that if user try to generate the json schema for the generated typebox code they suddenly end up with a different schema containing #def blocks and #ref blocks instead of "just the schema". I don't know how people use this library, but that could be annoying.

I tried to keep it the same by exporting Module.Import('...') (so end-users of the library don't have to know we used modules under-the-hood)

However, it turns out (see later in this post) that this still has some performance impact, so we can unwrap it further with Module.Import("<name>").$defs.<name>

What are the drawbacks of this? Will the performance using the generated code for validating be similar/same? Does this somehow use way more memory?

Here are the result of benchmarks I did

Validator

  • Value.Parse with Type.Number(): 1x (baseline)
  • Value.Parse with Type.Module({ bar: Type.Number() }): 2.2x
  • Value.Parse with Type.Module({ bar: Type.Number() }).$defs.bar: 1x

Initialization

Assuming you have the following type

Type.Object({
    author: Type.String(),
    message: Type.String(),
    replies: Type.Array(Type.Object({}))
  })

Performance

  • Using it as-is: 1x (baseline)
  • Type.Module({ bar: ... }): 17x

Memory

  • Using it as-is: 1x (baseline)
  • Type.Module({ bar: ... }): 2x (considering approximate object size)
  • Type.Module({ bar: ... }): 1.7x (considering total runtime memory taken by program)

Conclusion

From the runtime performance, you can see that there is basically no change if we unwrap the import ourselves (so maybe it's better to add unwraping with $defs.<name> in this PR). The only thing that changes in the initialization time (you only have to run a single time when you load the program).

You could argue that maybe we could generate different code to optimize the initialization time when there are no recursive references, but keep in mind

  • even with the Type.Module syntax, you can still initialize 500,000 object types in one second, so the impact on app startup will be trivial until they have thousands of types
  • when initializing 10 million entries in a loop, it takes 20kbs of memory using using Module.Import vs 12kbs of memory for the regular option, so the memory difference is be unnoticeable even with millions of types - although I assume the garbage collector mangles these benchmarks

SebastienGllmt avatar Apr 19 '25 20:04 SebastienGllmt

Thanks for the summary and the initial performance check. I think we should add this as you suggested to this PR

(so maybe it's better to add unwraping with $defs. in this PR)

Regarding the rest, seems reasonable to use Module everywhere. If people ever have problems, this could be changed later anyway.

xddq avatar Apr 21 '25 10:04 xddq

First of all thank you @xddq for the base work and @SebastienGllmt for the PR. Very sad, that this PR never got merged - it is very useful and is holding me back from using schema2typebox in a production project. Recursion is something that I find over and over in the JSONSchema we use - and thus I would be happy to see any progress here. Is there any help needed?

bastiion avatar Sep 27 '25 21:09 bastiion

hey @bastiion

Is there any help needed?

You have the open PR and my comments. If you want to invest the time to get this shipped, feel free to open up another PR with the comments adressed. I won't put time into shipping this myself, however I am down to review in a timely manner. If you come up with a PR in the next (1-10) days, you can expect a review within a couple (1-5) of days.

It would be great if you could find an easier way to supporting recursive schemas (again, I might be totally wrong here) but it felt a little too complicated. If you put time into this and say this is the way to go, I am also fine. Would just be glad if this gets a closer look.

xddq avatar Sep 28 '25 09:09 xddq

@bastiion I won't have time to work on this, so you're free to yoink my code and modify it as required to get it merged

SebastienGllmt avatar Sep 28 '25 12:09 SebastienGllmt

FYI it looks like Type.Recursive will be removed in the upcoming version of Typebox (1.0.0), and replaced with Type.Cyclic (which works like modules under the hood: https://github.com/sinclairzx81/typebox/blob/main/changelog/1.0.0-migration.md#typerecursive)

SebastienGllmt avatar Oct 17 '25 01:10 SebastienGllmt