arktype icon indicating copy to clipboard operation
arktype copied to clipboard

Explore recreating definitions from nodes

Open ssalbdivad opened this issue 1 year ago • 4 comments

Using the new metadata attached to type nodes, we could convert a type node back into a minimal definition for that node in a given scope.

This could be extremely useful for both generating ArkType definitions from other formats as well as testing.

Using a library like fast-check, we could generate thousands of random valid definitions in the minimal format and ensure that each is parsed successfully as a type node that can then be converted back to its original input.

ssalbdivad avatar Jul 05 '23 19:07 ssalbdivad

If I understand this correctly, this task involves taking a node and, from its metadata, deriving (or "recovering") the larger entity that it belongs to?

This is something I've just finished working on in a project of my own. I'd be happy to hop on a call sometime today (Saturday) and check my understanding. Depending on what metadata you've been collecting (which I would assume is pretty good), I should have time to work on this feature.

ahrjarrett avatar Jul 08 '23 11:07 ahrjarrett

@ahrjarrett I do think it is a very interesting problem, and the work you've done could definitely be relevant.

which I would assume is pretty good

Unfortunately, this is this is not the case. The current type system is totally isolated from any kind of metadata. Once a definition is parsed and converted to its type representation, there is no remaining association between the type node and its original definition.

This was partly because in general, decoupling the definition syntax from the type is a good thing. Most validators have to rely on traversing their AST to try and derive types- it's much better to maintain a fully reduced representation.

However, the new node structure designed to accommodate metadata like custom error messages also allows for other attached information, including potentially the original definition.

Not every type node will have a real definition that was directly provided by the user, e.g. for a chained expression like type({a: "string"}).or({b: "number"}), we would create a synthetic definition using a tuple expression like [{a: "string"}, "|", {b: "number"}].

This will be more complicated when the type node is the result of an intersection at some nested path, as there won't be a direct way to represent it like this at all. When props are intersected, if we always want to maintain some validation representation of the definition, we'll need to create some kind of tuple expression representation at the nested path, though I'd need to think more about exactly what this would entail.

So the goal of this would not be so much to recreate the original definition exactly, but more to create a certain "isomorphic" form of the definition that would have a set of rules according to which various sets of constraints were minimally represented.

Any definition could be parsed, then converted to a definition of this form. Any subsequent parse => unparse sequences would not change that representation. Having this capability for testing in particular would be incredibly powerful, as it would establish a universal property that could be checked to guarantee a definition (as long as our input is of the right form) is parsed correctly without us having to specify the expected result, which opens the door for us to rely on fast-check to generate many test cases based on those rules and to remove a lot of the manually specified edge cases that currently exist.

I'm happy to talk about this in more detail, though work on the issue will have to wait until I at least finish up a stable version of the metadata representation within the type system so it would be clear where that information could begin to be integrated.

ssalbdivad avatar Jul 08 '23 12:07 ssalbdivad

Got it -- let me now when you're ready to have this conversation.

So the goal of this would not be so much to recreate the original definition exactly, but more to create a certain "isomorphic" form of the definition that would have a set of rules according to which various sets of constraints were minimally represented.

Having this capability for testing in particular would be incredibly powerful, as it would establish a universal property that could be checked to guarantee a definition (as long as our input is of the right form) is parsed correctly without us having to specify the expected result, which opens the door for us to rely on fast-check to generate many test cases based on those rules and to remove a lot of the manually specified edge cases that currently exist.

Precisely the use case I had in mind. As long as you have an isomorphic transformation, using a library like fast-check makes testing a breeze -- in the past I used a test runner called ava because of its support for macros, which enables some pretty nice patterns when used with fast-check.

But assuming a switch to ava is not one of the goals of this change, we'd still be able to generate a pretty robust test suite by any isomorphism by testing

a -> b -> a === a
b -> a -> b === b
a -> b !== a (to protect against accidental "identity" transformations / false positives)
b -> a !== b (to protect against accidental "identity" transformations / false positives)

It sounds like this is what you have in mind, which is exactly what I just finished working on. It took a substantial amount of work because of the number of edge cases it uncovered

ahrjarrett avatar Jul 08 '23 12:07 ahrjarrett

@ahrjarrett Amazing! So you have a lot of context on this stuff then. I'll tentatively just assign this to you for now so I don't accidentally forget about this conversation somehow and implement it next time I'm trying to test the million edge cases for range intersections 😆

Just for a bit of additional background, the core piece of information we'd need from the metadata to map back to a canonical-ish definition is the scope. Without it, we'd have no way to associate type nodes with aliases, so nothing could be defined in a string- even keywords like number exist in the same way a custom alias in a scope does. The original definition itself would probably just lead us astray, because if we ever have to rely on it, we'd be missing a core part of the mapping that's important if it were specified in some other format.

I will let you know as soon as we're at a point where this is stable enough to follow up on, which should be quite soon! In the meantime if you're really anxious to get started, if you haven't looked at the structure of the type system yet, that might be a useful area to explore a bit so you have an idea of how various constraints are internally represented so it's not all foreign when you go to map it back. If you're interested, TypeNode on beta is probably a good balance between being up to date and working 🤣

ssalbdivad avatar Jul 08 '23 12:07 ssalbdivad