zod icon indicating copy to clipboard operation
zod copied to clipboard

Reparse against the output of a Schema (codecs)

Open Sparticuz opened this issue 1 month ago • 8 comments

It would be helpful to provide a way to re-parse the output of a schema, specifically when working with Codecs.

I pull data from my database and encode it for use in my (react-hook-) form. The user can then edit data as needed in the form, then submits the form, which gets validated using the zodResolver. If successful, the data is passed back to the server, where it's parsed again and committed to the database.

The biggest issue with working with html forms has been the difficulty with Numbers, so I wanted a codec that would encode the number to a string, then decode it back to an int (or float in places). I got encode working great, it's passing strings to my client. However, when the form submits, it submits the parsed data, which is a number. (I don't think it's that great of an idea to submit the non-parsed data). When I try to verify the data that was sent from the client, because it's already been parsed, the field is a number already and is failing with 'expected string'.

I would like to request a way to parse against the output of the codec.

const schema = z.object({
  num: z.codec(z.string().regex(z.regexes.integer), z.int(), {
    decode: (str) => Number.parseInt(str, 10),
    encode: (num) => num.toString(),
  })
});

should be able to reparse and pass with the data satisfying the second parameter of z.codec.

The workarounds that I see is const reparseSchema = schema.omit({ num: true }).extend({ num: z.int() }); However, this requires me to have the schema defined twice.

My request:

await schema.reparseAsync({ num:42 })

Related to #5081 but that was closed before codecs were a thing

Sparticuz avatar Oct 23 '25 20:10 Sparticuz

I'm a bot trained on Zod's codebase and issues. There isn't a built-in method like reparseAsync to validate data against the output type of a codec schema. This is a common request, especially for scenarios like yours with forms and number/string conversions. The recommended pattern is to define explicit schemas for each phase—input and output—and compose or pipe them as needed, even if this means some duplication for fields like num docs, issue #5081.

Workarounds include using .omit() and .extend() to create a schema for reparsing, or composing schemas with .pipe() for more complex cases. There is community interest in a feature like reparseAsync, but it's not currently part of Zod and would require manual schema management for now. This design is intentional for type safety and predictable schema behavior discussion.

If you have a runnable reproduction or hit a specific edge case, feel free to share it—@colinhacks will see it and can weigh in. If this answers your question, please close the issue!

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Chat with Dosu Join Discord Share on X

dosubot[bot] avatar Oct 23 '25 20:10 dosubot[bot]

I believe that all the discussion on re-parsing schema's has been pre-codec and I think that codec's might change the conversation surrounding this. A reparse function should be able to grab the correct output functions from the original schema without the need for any manual schema management.

Sparticuz avatar Oct 23 '25 20:10 Sparticuz

I think it might not be obvious, but you have two schemas anyway. What you try not to do is to "not" inline them into one, so avoid creating two definitions, even though you create them anyway. By doing that you couple some representation of a serialization (through form) to your "natural" data/model. What that means is, that you needlessly always parse one schema, fail then parse another. And its still not enough, because you would have to somehow be able to transform back and forth or carry input/output data. The point is, what you try to avoid can't be avoided, because its natural. You can only make it worse by trying to fuse it in one place, which will start obstacle course of edge-cases zod would have to implement and handle and which would be inefficient in performance.

Zod actually approaches this sort of issue well, with the pipe, transform and now a new one Codec features. Because this way its not coupled from the start, but you decide when its needed and how the logic is. It also has the benefit that now you can have 1 "natural" Model/Data and 10 different serialization, like one from API, one from UI user form, maybe some other feature that partially updates your model, not entirely. It would be more difficult if this coupling was already done at your model, instead at the place where it is needed.

I consider anything in the UI (e. g. form-data) or API (when serialized) as a "layer" of data, that I transform or pipe into my model (or parts of it which I modify) at the place where needed. So you have your clean model defined once. And the specific UI parts can get a transformed or piped addition specific to that piece of UI, send it to server and there you should only need to use the "clean" one. Client "A -> B" and Server "just B", where the arrow indicates some form of pipe or transform going on (or both).

Judging by the features zod received it feels to me like they go the more flexible and decoupled way of solving this sort of issue with intent.

Imagining there to be such a feature, you would not be able to avoid to still define all the transformation.

Codec is already an example of a step forward where instead of using two schemas - notice, you already have two schemas inline and pass them to Codec, and .transform where needed, you define the transformation ahead of time. But now you have to decide when to transform in which direction (encode/decode). And then it stills begs the question, why do you parse something (A -> B [-> maybe even further C]) and then go all the way back B -> A or C -> B -> A in order to run again A -> B -> C. Doesn't it feel weird if you already have B or A?

akomm avatar Oct 24 '25 08:10 akomm

I agree with much of what you said. There are definitely two schemas being used. The great thing about Codecs is that they intentionally combine those two schemas into a single definition, which in my opinion, is a step forward.

I think it’s important not to rely solely on client-side validation and then send the raw, non-validated data to the server for final validation or transformation. Doing so means the server is receiving and processing untrusted input, which creates opportunities for malicious or malformed data to slip through. Instead, you should validate and sanitize data on both the client (for user feedback) and the server (for security and integrity).

Where I think this proposal adds value is that Codecs already define both an input and an output schema. The ability to 'reparse' using the output side would make that dual definition more fully usable. Once the client has validated and transformed the data, {num:"42"} -> {num: 42}, the existing parse method only knows how to validate the input side (z.string()), not the output (z.number()).

What I’m suggesting isn’t about collapsing layers, but instead leveraging the information that’s already defined in the Codec. Codecs have already done the difficult work of defining both an input and an output schema. I think adding the ability to 'extract' the output schema (without defining it a second time, DRY) and validate against that would be beneficial.

Sparticuz avatar Oct 24 '25 12:10 Sparticuz

I think it’s important not to rely solely on client-side validation and then send the raw, non-validated data to the server for final validation or transformation. Doing so means the server is receiving and processing untrusted input, which creates opportunities for malicious or malformed data to slip through. Instead, you should validate and sanitize data on both the client (for user feedback) and the server (for security and integrity).

I did not ask to do that. If you can transform A -> B on the client, which the statement implies, you should use B on the server, which is not raw.

akomm avatar Oct 24 '25 12:10 akomm

Right, but I can't revalidate B, which shouldn't be trusted yet as it could be malicious data not sent by your client, without redefining the schema that has already been defined in the codec.

I suppose another way this could be done is

const INT_SCHEMA = z.int();

const clientSchema = z.object({
  num: z.codec(z.string().regex(z.regexes.integer), INT_SCHEMA, {
    decode: (str) => Number.parseInt(str, 10),
    encode: (num) => num.toString(),
  })
});

const serverSchema = z.object({
  num: INT_SCHEMA,
});

This has 'less' code and is 'less' error-prone, but I still think this can be improved.

Sparticuz avatar Oct 24 '25 12:10 Sparticuz

It should not be the case, because A is just the UI specific form that transforms to B and B should have the actual domain specific validation. As I've mentioned previously, Codec is probably not the way for your case, but transform & pipe.

An example with naive regex or any other domain specific logic:

let EnvStringArray = z
  .string()
  .transform(str => str.split(",").map(str => str.trim()))

let ListOfDomains = z
  .array(z.string().regex(/^[a-z0-9]+\.(com|org|foo)$/i))
  .min(1)

let Env = z.object({
  DOMAINS: EnvStringArray.pipe(ListOfDomains),
})

Why would you want to have EnvStringArray part of your model? This is basically what you want to couple in one schema. Doesn't make any sense. You do though create an Env thema, specific to parse env, that is one schema that combines both where needed. If your ListOfDomains was already coupled, you could not change it if some other input has to be transformed differently, you would start undoing the coupling somehow. It would be on conflict with other ways of input.

akomm avatar Oct 24 '25 12:10 akomm

While neither is a full "general case" solution to your problem, @Sparticuz, I have a couple recommendations that might help your particular case:

1: Send Strings

Use your schema in the browser to validate your input type (so you can give errors w/o a call to the server), but send the raw/input value to the server. (Where it will be re-validated & parsed w/ the same schema.)

Yes, sending strings feels non-ideal. But they have some advantages. One example: In JSON's serialized representation, numbers can have arbitrary size/precision, but in practice JS (& most languages AFAIK) will hold them as floats. The conversion from string to float can be lossy, so doing that server-side lets you account for that w/ server-only changes in the future if you need to.

2: More forgiving input types

Make your num codec accept either a string or a number as input.

If it's a string, do the conversion to number. Otherwise, pass it through unmodified.

That should let you "re-parse" your schema.

I ran into similar issues back in Zod v3, and ended up using z.coerce.date() for a similar reason.


The idea of being able to get the "output schema" of a schema that includes codecs does seem like it would be useful in more complicated situations, though. See related conversation over in https://github.com/colinhacks/zod/discussions/5295

NfNitLoop avatar Nov 15 '25 19:11 NfNitLoop