zod Metadata take 2

Metadata take 2

Open jedwards1211 opened this issue 1 year ago • 12 comments

My approach to metadata in #2471 has major TS performance problems because of the way it uses intersection types.

This is an alternative approach where z.string().withMetadata({ openapi: { refId: 'user' } }) creates a ZodMetadata<ZodString, { openapi: { refId: 'user' } }> type.

Metadata nested within ZodOptional, ZodNullable, ZodPromise etc can be extracted with z.extractMetadata; z.extractDeepMetadata would also extract out of ZodArray element and ZodSet value schemas.

This would provide an alternative to monkeypatching for libraries like zod-to-openapi. Wrapping schemas wouldn't work for their use case of making sure $refs get generated in the OpenAPI schema for references to a given Zod schema.

Jun 08 '23 21:06 jedwards1211

Deploy Preview for guileless-rolypoly-866f8a ready!

Built without sensitive environment variables

Name	Link
Latest commit	bed0b94aaf9ac08cc560b90bcfdd8e5201e373a4
Latest deploy log	https://app.netlify.com/sites/guileless-rolypoly-866f8a/deploys/648243aa1d0ac200086abef2
Deploy Preview	https://deploy-preview-2496--guileless-rolypoly-866f8a.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Jun 08 '23 21:06 netlify[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sep 07 '23 00:09 stale[bot]

Here's some things I don't like:

ZodMetadata existing as a no-op class just to hold data
People getting confused and mad because they can't do stuff likez.number().metadata({}).min(1)
Code that looks like extractMetadata. This is the same concern I've expressed with deepPartial and discriminatedUnion - if you need to do a huge switch statement over every Zod subclass, you should probably try to think of a better API

I haven't done a good job of expressing all of this. I've also been horribly busy but now I have a window to work on Zod 4 full time! So I'm going to do my darnedest to explain why I've been so against metadata, and hopefully propose a solution that will make people happy,

I have some ideas around strongly-typed metadata in Zod 4 that I'll write up as an RFC soon. I agree this is becoming more and more important as @rattrayalex and others have mentioned. People clearly want & need to attach metadata to their schemas. But letting them attach any untyped object with a .metadata() property doesn't help anyone and leads to bad/messy code.

98% of people want metadata so they can use their Zod schemas as a source of truth, and generate other artifacts (JSON Schemas, etc) from them. I may be over-indexing on JSON schema, but generally there are only two places you want to attach metadata: on objects and on properties. (Right? RIGHT!?)

It would be easy to let users specify metadata on a particular subclass (say, ZodObject) in a typesafe way. You use declaration merging to let users specify an interface, then z.object() can detect that and accept metadata in a typesafe way. This is the same principle zod-to-openapi uses to add methods to ZodObject: https://github.com/asteasolutions/zod-to-openapi/commit/dc01991ec8e8e8fbe1e0707ab9607d4fead29ec9#diff-10348b603df97619f3ed4b79057f539ca90ef8ea17f228a11212c9150605b4b1L37-L58

Here's how it would look to an end user:

declare module "zod" {
  interface ZodObjectMetadata {
    $id: string
  }
}

z.object({ name: z.string() }, {
  metadata: { $id: "User" } // fully typed
})

Keep in mind, with this approach you'd have to specify compliant metadata on all invokations of z.object() throughout your project. Probably not desirable.

There's already a lot of issues there, and we haven't even discussed property metadata :( Here's a naive solution:

declare module "zod" {
  interface ZodObjectMetadata {
    description: string
  }
}

z.object({
  name: z.string().metadata({ 
    description: "First name" // typesafe!
  })
})

We can provide some typesafety in the .metadata() call, but we can't prevent users from simply not calling .metadata().

z.object({
  name: z.string()
})

Or rather, we probably could but it would require one of two things: adding a ZodMetadata class (no) or adding a statically tracked generic to every ZodType subclass that would track the inferred metadata type of each schema (also no).

To explain that second point a little deeper. Right now when you call z.string(), you get out a ZodString. If you want the ability to infer metadata and have it be strongly typed, you would need to add a generic onto that class:

class ZodString<Meta = null> {
    _metadata: Meta
  metadata<T>(data: T): ZodString<T>
}

Now when you call z.string(), you see this in Intellisense:

z.string(); // ZodString<null>

It seems like a little thing but this would have to be added to every subclass, muddy up every Intellisense autocompletion, and make life a little bit worse for all users of Zod.

To add complexity to this, Zod is compositional. That means when you call z.string().optional(), you get back a ZodOptional instance wrapping a ZodString instance. So what happens if you have this?

z.string().metadata({ description: "Sup" }).optional()

That's right, you have a ZodOptional instance with no metadata, because the metadata was attached to the inner ZodString. That means Zod would have to bubble up the metadata through all these method calls, copying it each time to the new outer-layer.

But then there are also edge cases. Should the metadata survive across calls to .transform()? The data structure could have been dramatically changed inside the transform to something totally unrecognizable, making the metadata inapplicable. How about .and()? Should we merge the metadata of the two intersected types? Who wins in the case of a conflict?

To top it all off, it would still annoying for a third-party library to write a function can statically enforce the metadata type. The only way this works is if places constraints

I suppose this is the kind of stuff that only is apparent when you start thinking through the internals of the implementation. It's complicated to explain, but people don't tend to be satisfied when I say "trust me, it's complicated and probably a really bad idea".

In case it isn't obvious, every idea so far is really really bad. The only solution, as far as I can see, is to introduce a generalized ZodObject-like "schema collection" type with first-class support for configurable metadata. It would look something like this:

// "User" is a collection that contains properties
// each property is associated with a FieldMetadata object
// and a schema that conforms to z.ZodTypeAny (any Zod schema)
type FieldMetadata = { description: string; };
const User = z.collection<FieldMetadata, z.ZodTypeAny>()
  .add("firstName", z.string(), { description: "First name" })
  .add("lastName", z.string(), { description: "Last name" });

In JSON Schema, you often want collections of schemas. Each schema should have some known, well-typed metadata, as well as each field of each schema.

The z.collection API can compose to accommodate this.

// "User" is a collection that contains properties
// each property is associated with a FieldMetadata object
// and a schema that conforms to z.ZodTypeAny (any Zod schema)
type FieldMetadata = { description: string; };
const User = z.collection<FieldMetadata, z.ZodTypeAny>(); 
const BlogPost = z.collection<FieldMetadata, z.ZodTypeAny>(); 

// "SchemaCollection" is a collection that contains collections
// Each  property is associated with a FieldMetadata object
// and a schema that conforms to z.ZodTypeAny (any Zod schema)
type ObjectMetadata = { $id: string };
const SchemaCollection = z
  .collection<ObjectMetadata, z.ZodCollection<FieldMetadata, z.ZodTypeAny>>()
  .add("User", User, { $id: "User" })
  .add("BlogPost", BlogPost, { $id: "BlogPost" });

Collections require each element to have a label. You can access this with .items.

User.items.firstName; // { schema: ZodString, metadata: FieldMetadata }
SchemaCollection.items.User; // { schema: User; metadata: ObjectMetadata }

When you call .parse(), it acts like a ZodObject, using the labels as keys. (Admittedly it's a little weird that SchemaCollection above has a .parse() method but whatever.)

User.parse({
  firstName: "Colin", 
  lastName: "Sleepy"
})

I've been saying since at least 2021 that the right way to attach metadata to Zod schemas is not to do it at all. Instead, you should be using composition. Let your users pass in a Zod schema to specify types. If you need any additional metadata, it should be specified alongside the Zod schema.

If you are building a library on top of Zod that, say, generates JSON Schema specs, just let people pass in an object that looks like this:

interface {
  schema: z.ZodType,
  metadata: WhateverYouWant
}

This ZodCollection concept is just a general-purpose (and I suppose "first party") way to do the compositional thing I've been recommending for years.

There's also this approach for specifying metadata to fields using something that's closer to the existing z.object() API, but it's a bit verbose/obvious. I'm not a fan, but perhaps this will spark ideas in others.

z.meta<{label: string}>().object({
  firstName: { 
    $type: z.string(),
    label: "First name"
  },
  age: {
    $type: z.number(),
    label: "Age"
  }
})

Apr 09 '24 07:04 colinhacks

I’m using zod already since a few years and few days ago it was the first time I had the need for adding metadata to schemas for a graphql query builder. The approach I ended up with, is almost completely independent from zod (inspired by this comment) utilizing WeakMap:

type MyMetadata = { foo: string; };
const myMetdataRegistry = new WeakMap<z.ZodTypeAny, MyMetdata>;

function addMetadata<Schema extends z.ZodTypeAny>(schema: Schema, metadata: MyMetadata): ZodLazy<Schema> {
    const wrappedSchema = z.lazy(() => {
         return schema;
    });
     myMetdataRegistry.set(wrappedSchema, metadata)
     return wrappedSchema;
}

const mySchema = z.object({
    someProperty: addMetadata(z.string(), { foo: 'bar' })
})

So far it works quite well for my use-case. However I would also be happy about an easier way to handle metadata. Relying on declaration merging etc doesn’t sound easier though. Regarding ZodCollection: so far I like the idea. I’m not quite sure why the chaining API is needed and why it can’t look like similar to ZodObject where we just define the shape as an object literal. I’m also curious how inference would work via z.infer, do we need some kind of build() function when using the chaining API?

I may be over-indexing on JSON schema, but generally there are only two places you want to attach metadata: on objects and on properties. (Right? RIGHT!?)

So far I also only had the use-case of object properties. A while ago I had the need to add a custom property to a union type so I can create better error messages, but I guess this is another topic because we would need to pass through those custom properties to ZodIssue.

I think it is also worth to have look at how @effect/schema handles annotations.

Apr 09 '24 09:04 lo1tuma

Thanks for the thoughtful response.

I’m not quite sure why the chaining API is needed and why it can’t look like similar to ZodObject where we just define the shape as an object literal

Good point, the chaining API is needed if you want to infer the labels as literal strings. For a collection where this isn't important, you can use an unchained API.

const SchemaCollection = z.collection<undefined, z.ZodTypeAny>();

// omit labels
SchemaCollection.add("User", User);
SchemaCollection.add("BlogPost" BlogPost);

SchemaCollection.items; 
// {[k: string]: { schema: ZodTypeAny, metadata: undefined }}

and why it can’t look like similar to ZodObject where we just define the shape as an object literal

I proposed something like this at the end. Do you have something else in mind?

I think it is also worth to have look at how @effect/schema handles annotations.

This seems to be totally untyped and not reflected in the type signature of the resulting schema. So there would be no way for tooling authors to statically enforce something like "give me a schema with metadata that matches SomeInterface".

This is true for your solution as well (though it does get the job done).

Apr 09 '24 22:04 colinhacks

Really appreciate all the thought put into this @colinhacks ! I'm sadly at the end of a busy day and not able to do a thorough writeup, but in case it helps, a few gut reactions:

It's a hard requirement to expose a fluent API to my library's users, so mylib.addMetadata(z.string(), {…}) is a no-go for me. I want it to to look and feel like zod, because that's what people are familiar with. If you make me do that, I'll just extend/monkeypatch/fork zod (as we've done so far, but hope not to do).
The interface ZodObjectMetadata { proposal you have is something I wouldn't love, but could probably live with. a. In practice, we'd again probably extend/monkeypatch/fork zod to add things like z.object({}).myMetaField(123) as sugar for z.object({}, { metadata: { $myLib: { myMetaField: 123 } } }) but at least the internals would be much cleaner / more peaceful.
just objects & properties sounds like an OK limitation to me at first blush
"we can't prevent users from simply not calling .metadata()" - this sounds fine to me fwiw. In general I think it's reasonable to not expect tremendous typesafety of metadata.
"adding a ZodMetadata class (no)" – FWIW it wasn't clear to me on a first skim of this document why this is a hard no (in a sea of bad options). I'm not saying I think it's good, nor that you failed to justify it (maybe I missed it), just that I think a strong justification is needed.

I hope this is helpful!

I'm quite curious what @jedwards1211 thinks, if he's around to take a look.

Apr 10 '24 02:04 rattrayalex

@colinhacks

I've been saying since https://github.com/colinhacks/zod/issues/507#issuecomment-873473128 that the right way to attach metadata to Zod schemas is not to do it at all. Instead, you should be using composition.

I think this keeps coming up despite what you've said because composition doesn't work for what some people are trying to accomplish.

I may be over-indexing on JSON schema, but generally there are only two places you want to attach metadata: on objects and on properties. (Right? RIGHT!?)

I'm afraid not, at least when it comes to generating OpenAPI specs from zod schemas. One simple real-world example is a shared enum type:

const PetType = z.enum(['dog', 'cat']).metadata({ openapi: { ref: 'PetType' } })

const AddUserRequest = z.object({
  name: z.string(),
  pets: z.array(z.object({
    name: z.string(),
    type: PetType,
  })).optional(),
})

const GetUserPetsRequest = z.object({
  userId: z.string(),
  petType: PetType.optional(),
})

const GetUserPetsResponse = z.array(z.object({
  name: z.string(),
  type: PetType,
}))

...

This way of doing things allows PetType to be used in many different places, while still being able to output { "ref": "#/components/schemas/PetType" } in all the corresponding places in the generated OpenAPI spec, instead of inlining a copy of the whole type in each place.

Without being able to attach the metadata to PetType itself, you'd have to attach it to every property that uses the enum type, which would be a lot of repeating yourself.

Can you illustrate how you would solve a problem like this with composition? I can't imagine a useful way to do it...

const PetTypeWithMetadata = {
  schema: z.enum(['dog', 'cat']),
  metadata: { openapi: { ref: 'PetType' }},
}

const GetUserPetsRequest = z.object({
  userId: z.string(),
  // when you use the schema here, the metadata is lost,
  // and inaccessible to tools that traverse the Zod schema to generate OpenAPI.
  petType: PetTypeWithMetadata.schema.optional(),
})

You could make some weird system like

function objectWithMetadata(props) {
  return {
    schema: z.object(mapValues(props, prop => prop instanceof ZodType ? prop : prop.schema)),
    metadata: mapValues(props, prop => prop instanceof ZodType ? undefined : prop.metadata)),
  }
}
const GetUserPetsRequest = objectWithMetadata({
  userId: z.string(),
  // when you use the schema here, the metadata is lost,
  // and inaccessible to tools that traverse the Zod schema to generate OpenAPI.
  petType: PetTypeWithMetadata,
})

But this would be no good because you'd end up having to implement arrayWithMetadata, optionalWithMetadata, nullableWithMetadata, etc to get anywhere -- you're just re-implementing the same tree structure Zod schemas provide, and making the code less ergonomic.

Apr 10 '24 03:04 jedwards1211

That's right, you have a ZodOptional instance with no metadata, because the metadata was attached to the inner ZodString. That means Zod would have to bubble up the metadata through all these method calls, copying it each time to the new outer-layer.

It's okay that the ZodOptional has no metadata; it often makes the most sense not to bubble anything up. To take my last example,

const PetType = z.enum(['dog', 'cat']).metadata({ openapi: { ref: 'PetType' } })

If you declare a property like type: PetType.optional(), you don't want that ZodOptional to inherit the metadata because you don't want the generated #/components/schemas/PetType in the OpenAPI spec to be optional itself. You just want to generate an optional property that refers to it.

The use cases I'm aware of involve traversing the Zod schema top-down and looking for metadata at each level. I know you've said we shouldn't be traversing Zod schemas, but for those of us who want to use them as a source of truth for other things like an OpenAPI spec or JSON schema, traversal is the best way to accomplish that. Zod is a tree structure...trees are gonna get traversed 🌲

Apr 10 '24 03:04 jedwards1211

On further reflection, perhaps there should probably just be new subclasses for ZodJsonSchema and ZodJsonSchemaProperty. These can implemented in Zod core, or a new package (zod-json-schema), or some kind of plugin. This would make it possible to design the API for exactly this use case. They'll extend ZodType so they'll be compatible with all existing Zod schemas. Maybe something like:

import * as z from "zod";
import "zod/plugins/json-schema";

const User = z.jsonschema.schema({ 
  $id: "User",
  properties: {
    firstName: z.jsonschema.property({
      description: "First name",
      type: z.string()
    })
  }
})

Apr 10 '24 07:04 colinhacks

@colinhacks Yeah, I think something that could work well! (Assuming that third parties can add other things like z.mylib.property({}) and those can bear typesafe metadata).

Apr 10 '24 11:04 rattrayalex

@colinhacks want to make sure it's clear that objects and properties aren't the only places that need metadata. Even primitives need it.

In the JSON schema docs there is this example:

{
  "$id": "https://example.com/schemas/customer",

  "type": "object",
  "properties": {
    "first_name": { "$ref": "#/$defs/name" },
    "last_name": { "$ref": "#/$defs/name" },
    "shipping_address": { "$ref": "/schemas/address" },
    "billing_address": { "$ref": "/schemas/address" }
  },
  "required": ["first_name", "last_name", "shipping_address", "billing_address"],

  "$defs": {
    "name": { "type": "string" }
  }
}

Although it looks unnecessary to use $defs for this, it would be very useful if there are pattern, length, etc constraints on name.

This would translate to something like

const name = z.jsonschema.string({ $def: 'name' })

const address = z.jsonschema.object(...)

const customer = z.jsonschema.object({
  $id: 'https://example.com/schemas/customer',
  properties: {
    first_name: z.jsonschema.property({ type: name }),
    last_name: z.jsonschema.property({ type: name }),
    shipping_address: z.jsonschema.property({ type: address }),
    billing_address: z.jsonschema.property({ type: address }),
  },
})

Apr 10 '24 15:04 jedwards1211

In case it's useful, I just came across a similar project to ours, which wraps zod with a .openapi() helper: https://hono.dev/snippets/zod-openapi

I haven't yet dug into the source code to see how it's implemented, but @colinhacks it may be interesting to you.

Apr 17 '24 03:04 rattrayalex

I'm going to close this. Some combination of a new ZodJsonSchema class and plugins will be the recommended way to handle metadata in Zod 4. A proper RFCs for this will be published soon. I appreciate all the great work and discussion on this.

May 03 '24 01:05 colinhacks

zod zod copied to clipboard

Metadata take 2

✅ Deploy Preview for guileless-rolypoly-866f8a ready!

zod
zod copied to clipboard

Deploy Preview for guileless-rolypoly-866f8a ready!