zod icon indicating copy to clipboard operation
zod copied to clipboard

Codecs: cross-container impedance mismatch issue

Open racinette opened this issue 3 months ago • 7 comments

I tried implementing a codec for transforming a record<S, T> into a map<K, T>, where S is a subtype of string, K is arbitrary, and T may be either a codec or a type. This would be useful to define arbitrary KV containers decoded from json inputs.

What I could get so far:

const noCodecValue = z.object({
  a: bigint(),
});
const yesCodecKey = z.codec(z.string(), z.enum(["a", "b"]), { encode: (input) => input, decode: (input) => {
  if (input === "a") {
    return "a";
  }
  if (input === "b") {
    return "b";
  }
  return z.NEVER;
}});

// basically, we want
// input: record<string, { a: bigint | number | string | BN }>
// output: map<"a" | "b", { a: bigint }>

// what we get:
const yn = z.codec(
  z.record(yesCodecKey._zod.def.in, noCodecValue),
  z.map(
    yesCodecKey._zod.def.out, 
    z.any()  // <-- it should somehow reference the output of noCodecValue
    // putting noCodecValue here will get us z.input<typeof noCodecValue>, which is wrong
  ),
  {
    decode: (input) => new Map(Object.entries(input).map(([k, v]) => [
      yesCodecKey.decode(k), 
      v  // <-- output is already there, decoding is done implicitly, since z.object is not a codec
    ])),
    encode: (input) => Object.fromEntries(
      Array.from(input.entries()).map(([k, v]) => [
        yesCodecKey.encode(k), 
        noCodecValue.encode(v)  // <-- this expects to get the output of noCodecValue, however, we'd like that to be the input
      ]
    )),
  }
);

bigint implementation for reference:

const bigint = () => {
  return z.codec(
    z.union([
      z.string().regex(z.regexes.integer),
      z.int(),
      z.bigint(),
      z.instanceof(BN),
    ]),
    z.bigint(),
    {
      decode: (input, ctx) => {
        if (typeof input === 'bigint') {
          return input;
        }
        try {
          if (typeof input === "string" && input === "") {
            // special case: for some reason BigInt constructor accepts empty string and returns a 0n
            throw new Error("empty string is not a valid integer");
          }
          if (input instanceof BN) {
            return BigInt(input.toString());
          }
          return BigInt(input);
        } catch (err: any) {
          ctx.issues.push({
            code: "invalid_format",
            format: "json",
            input: input.toString(),
            message: err.message ?? "not a valid integer",
          });
          return z.NEVER;
        }
      },
      encode: (input) => input.toString(),
    }
  );
};

There seems to be no plausible way to do it currently.

racinette avatar Sep 11 '25 13:09 racinette

Zod (v4) codecs are designed for bidirectional transformations between two schemas, and you can compose them for many scenarios(1)(2)(3)(4). However, Zod's current API does not provide a built-in way to automatically "lift" codecs across container types like z.record or z.map so that both keys and values are recursively decoded/encoded according to their codecs. When you use z.codec for a value type, the container (e.g., z.record or z.map) expects plain schemas, not codecs, and there's no generic API to automatically propagate the encoding/decoding logic at multiple levels(3)(4)(5).

The challenge you describe—where you want keys and values both to be decoded/encoded using their respective codecs within a container—cannot be fully solved with the existing public Zod (v4) API. Specifically:

  • z.record and z.map expect ordinary schemas for their key/value types and do not recursively invoke .decode/.encode of child codecs(3)(6).
  • There is no mechanism in Zod (v4) to automatically "compose" codecs so that their transformations are applied recursively inside container schemas(3)(5).
  • If you use a codec as the value type in z.map, you only get type-level inference for the output, but the runtime encode/decode methods of the codec are not called automatically by z.map's own encode/decode(3)(4).

Currently, the only way to achieve true cross-container encode/decode is to write custom logic for your container, as you did in your example. This means you must manually call the appropriate codec's encode/decode functions on both keys and values(3)(5).

If you need full recursive codec application (automatic cross-container encode/decode), you would need to implement a utility that traverses your data structure and applies the correct codec transformations at each level, which is not natively supported by Zod (v4)(3)(5).

📚 Sources:

  1. Defining schemas
  2. Zod v4 LLMs-Full
  3. Codecs
  4. Zod v4 LLMs-Full
  5. V4: suggestion z.codec
  6. Zod v4 LLMs-Full

Have another question? Just tag @inkeep.

@inkeep I call BS, since codecs do work recursively out of the box. This is pure hallucination:

> const c = z.codec(z.string(), z.bigint(), { decode: (input) => BigInt(input), encode: (input) => input.toString() });
undefined
> const o = z.object({ b: c });
undefined
> o.decode({ b: "1" })
{ b: 1n }
> o.encode(o.decode({ b: "1" }))
{ b: '1' }
> const oo = z.object({ o: o });
undefined
> oo.encode(oo.decode({o: { b: "1" }}))
{ o: { b: '1' } }
> oo.decode({o: { b: "1" }})
{ o: { b: 1n } }

racinette avatar Sep 11 '25 13:09 racinette

let's see how @Dosu does here

colinhacks avatar Sep 11 '25 22:09 colinhacks

Great question! You're absolutely right that codecs work recursively in z.object—when you call .encode() or .decode() on an object schema, Zod will delegate those calls to any child properties that are codecs, so deeply nested objects transform as expected. Your REPL example is spot-on.

However, this recursive behavior does not generalize to all container types. For z.record and z.map, Zod does not automatically propagate encode/decode to nested codecs for values (and for z.map, not for keys either). These container types expect schemas, not codecs, and do not recursively call codec logic on their child types. If you want full recursive codec application in records or maps, you need to manually invoke the child codec's encode/decode methods in your custom logic—just like you did in your initial example. This distinction is reflected in both the codebase and recent updates to the codec system in Zod v4[^1][^2].

So: recursive codec application works out-of-the-box for z.object, but not for z.record or z.map.

Tagging @colinhacks for visibility. If this answers your question, feel free to close the issue!

[^1]: PR #5113 - ZodCodec implementation [^2]: PR #5130 - Codec tests and docs

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Chat with Dosu Join Discord Share on X

dosubot[bot] avatar Sep 11 '25 22:09 dosubot[bot]

Yeah, I've been playing with the idea of z.in() and z.out() utilities that recursively traverse the schema and give you the "input" or "output" side of all pipes/codecs/etc. This is what you're looking for I think.

Below is a simple & working implementation. Note that codecs (in regular Zod, not Zod Mini) have .in and .out properties that simplify this somewhat (no need to access ._zod). Until those utilities are implement, this still requires creating a new object schema (z.object({ a: bigint().out })).

const yn = z.codec(
  z.record(yesCodecKey.in, noCodecValue),
  z.map(
    yesCodecKey.out,
    z.object({ a: bigint().out }) // <-- it should somehow reference the output of noCodecValue
    // putting noCodecValue here will get us z.input<typeof noCodecValue>, which is wrong
  ),
  {
    decode: (input) =>
      new Map(
        Object.entries(input).map(([k, v]) => [
          yesCodecKey.decode(k),
          v, // <-- output is already there, decoding is done implicitly, since z.object is not a codec
        ])
      ),
    encode: (input) => {
      const out: Record<string, { a: bigint }> = {};
      for (const [k, v] of input.entries()) {
        out[k] = { a: v.a };
      }
      return out;
    },
  }
);

colinhacks avatar Sep 11 '25 22:09 colinhacks

@colinhacks thank you, that's exactly the kind of API I ended up looking for when trying out different approaches to implement this! I'm gonna leave the issue open, in case the generic utilities are implemented, if that's ok with you.

racinette avatar Sep 12 '25 10:09 racinette

This doesn't work if the schemas are nested inside an object, since the intermediate object doesn't have the in and out properties. For example, say with codecs like

export function JSNumberSetCodec(maxSize: number) {
  return z.codec(
    z.array(z.number()).max(maxSize).readonly(),
    z.set(z.number()).readonly(),
    {
      decode: (arr) => {
        return new Set(arr);
      },
      encode: (set) => Array.from(set),
    }
  );
}

export function JSStringMapCodec<V extends z.ZodType>(
  valSchema: V,
  maxSize: number
) {
  return z.codec(
    z
      .record(z.string(), valSchema) // needs to be valSchema.in
      .refine((r) => Object.keys(r).length <= maxSize, {
        message: `Exceeds maximum entries of ${maxSize}`,
      })
      .readonly(),
    z.map(z.string(), valSchema).readonly(),
    {
      decode: (r) => {
        // type error here because r is z.output<V>
        return new Map(Object.entries(r) as [string, z.input<V>][]);
      },
      encode: (m) => Object.fromEntries(m) as Record<string, z.infer<V>>,
    }
  );
}

and then you have a schema like

const Foo = z.strictObject({
  F: JSNumberSetCodec(20)
});

const Bar = z.strictObject({
  B: JSStringMapCodec(Foo, 20)
});

Note that Foo does not have an in property, and without the in property the codecs fail. Putting a breakpoint inside the string map decode function, you can see that the number set has already been converted by the time the map decode is running, but then zod tries to decode the set a second time. You get an error Invalid input: expected array, received Set because zod attempts to convert the set inside Foo twice.

wuzzeb avatar Dec 04 '25 17:12 wuzzeb