valibot
valibot copied to clipboard
Bi-directional encoding/decoding
I have been asked to write up an issue regarding a bi-directional encoding/decoding process which I mentioned in a X thread.
In the past I have used io-ts extensively. Without thinking about any of the typeclasses (Option, Either) pertaining to the fp-ts ecosystem, one feature I found extremely useful was the idea of codecs being isomorphic transformations, a decode and encode process if you will. This process (called isomorphic) transforms one type A into another type B while preserving all properties so we can transform it back to A.
To quote the io-ts website:
A value of type Type<A, O, I> (called βcodecβ) is the runtime representation of the static type A. A codec can:
- decode inputs of type I (through decode)
- encode outputs of type O (through encode)
- be used as a custom type guard (through is)
I believe effect/schema implements a similar idea.
Correct me if I'm wrong but I think valibot implements the "decoder" part of a codec (using the aforementioned terminology). I could imagine folks being interested in the encoding process as well.
Let's consider a schema to decode a string, which we expect to be a valid ISO date string.
const DateSchema = v.pipe(
v.string(),
v.isoDate('The date is badly formatted.'),
v.transform(i => new Date(i))
);
We can use this schema to decode some input
const result = v.parse(DateSchema, "2024-01-01");
console.log(result); // `Date` type β
Now let's say that this date is stored in some storage (like postgres) which doesn't support Date thus we need to serialize it. If the transform above is the deserialization process, we could imagine providing a serialization function as well so we could take a value of type Date and co-locate its serialization logic inside the schema so we can encode it back before storing the value again.
Using some hypothetical API
const DateSchema = v.pipe(
v.string(),
v.isoDate('The date is badly formatted.'),
- v.transform(i => new Date(i))
+ v.transform(i => new Date(i), date => date.toISOString())
);
const input = Storage.get("date");
const decoded = v.parse(DateSchema, input);
const encoded = v.encode(DateSchema, decoded);
Storage.set("date", encoded)
We would be able to transform from string -> Date -> string, preserving isomorphism.
input === v.encode(DateSchema, v.parse(DateSchema, input))
We can imagine this being also useful for other custom data structures which doesn't benefit from auto serialization like Date does in JS, or if your serialization process outputs something different than that the typical toString would.
I believe to be able to achieve this in valibot today, you'd have to write up functions separately but there are advantages of co-locating it inside the schema so you can start using higher-level API operating on schemas directly and being able to automatically decode/encode such as in this hypothetical use case:
const storage = {
save: (schema) => (input) => Db.save(v.encode(input))
}
storage.save(UserSchema)({...})
This process becomes more useful as codecs are being composed together and your decoding/encoding process becomes more complex.
Let me know if this makes sense. I can try to add more details should you want more context βοΈ
To be honest, I never thought about this use case. That is why Valibot's dataflow is optimized for a single direction.
Technically we could e.g. add an isomorphicTransform action that accepts a decode and an encode function. The encode method would then recursively search for 'isomorphic_transform' actions in the pipelines of the schema and execute its encode function. This seems a bit hacky at first and I am not sure if this is the right approach as bugs can easily occur when mixing transform or other transformation actions with isomorphicTransform.
const DateSchema = v.pipe(
v.string(),
v.isoDate('The date is badly formatted.'),
v.isomorphicTransform(i => new Date(i), (date) => date.toISOString())
);
const encoded = v.encode(DateSchema, data);
Another and perhaps less buggy approach would be to add an encoder method that adds an encoding function to a schema. This way, encode could take only the type SchemaWithEncoder<...> as its first argument to directly execute its encoding function.
const DateSchema = v.encoder(
v.pipe(
v.string(),
v.isoDate('The date is badly formatted.'),
v.transform((i) => new Date(i))
),
(date) => date.toISOString()
);
const encoded = v.encode(DateSchema, data);
Neither approach will work for complex nested data without writing a very complex and non-modular encode method. How do io-ts and effect/schema handle complex nested data? I think to natively support encoding a single entry within a complex and nested schema, we need to rethink the entire implementation.
A workaround for now might be to write two schemas. One that decodes and one that encodes.
import * as v from 'valibot';
const DateDecodeSchema = v.pipe(
v.string(),
v.isoDate(),
v.transform((input) => new Date(input))
);
const DateEncodeSchema = v.pipe(
v.date(),
v.transform((date) => date.toISOString())
);
const decoded = v.parse(DateDecodeSchema, '2024-01-01');
const encoded = v.parse(DateEncodeSchema, decoded);
How do io-ts and effect/schema handle complex nested data
I'd have to dig into the details of their source code, I'm not entirely sure at the moment. I would imagine there is some way of recursively calling the encode/decode function as it goes through each schema/properties.
Neither approach will work for complex nested data without writing a very complex and non-modular encode method
I think the second approach (v.encoder) will break co-location of the encode/decode functions. It would be totally possible to write the transform in some modules, import that schema, compose it with others and only later on write out the encoding function on a larger type, which can lead to potentially complex (and possibly easier to get out of sync) encoding functions. I think keeping encoding function co-located to the schema might lead to smaller, more composable encoders.
Perhaps the first approach would be a bit simpler π€ β I'm trying to think. I wrote a more complex example using io-ts here. It might be a bit far-fetched but this could represent a realistic scenario or a good test bed.
We have a few levels of codecs (DateFromString, Value/ValueFromString, Values/ValuesFromString). Using the first approach we could possibly imagine something like this
const DateFromString = v.pipe(
v.string(),
v.isoDate('The date is badly formatted.'),
v.isomorphicTransform(i => new Date(i), (date) => date.toISOString())
);
const Value = v.tuple([v.string(), DateFromString]);
const ValueFromString = v.pipe(
v.string(),
// How do we handle potential failures?
v.isomorphicTransform(x => JSON.parse(i), JSON.stringify),
Value
);
const Values = v.record(v.string(), Value);
const ValuesFromString = v.pipe(
v.string(),
v.isomorphicTransform(() => ..., () => ...)
)
v.parse(ValuesFromString, 'name:["key","2020-01-01"];name2:["key2","2021-01-01"]') // Should parse
v.encode(ValuesFromString, {
name: ["key", new Date("2020-01-01")]
}) // Should encode back to string
I don't know the details of valibot yet but during encode, would there be a way to walk up the chain of schema and call their isomorphic transform action as you encounter one? In a pseudo stack
- `DateFromString#isomorphicTransform`
- `ValueFromString#isomorphicTransform`
- `ValuesFromString#isomorphicTransform`
Let's keep brainstorming if that's something you'd like to see in the library. That said, the two schema approach works too and might be simple enough to not introduce additional complexity. With the two schema approach, would we be able to differentiate at the type level the encoder and the decoder? If so that might be fine.
const doStuff = (decoder: Schema<...>, encoder: Schema<...>) => {}
Valibot is currently implemented in such a way that each schema has it's own ._run method. This method usually contains only a few lines with the complete validation logic of the schema. This makes Valibot modular and small in terms of bundle size. When calling parse or safeParse, the internal ._run method of the schema is called. For nested schemas, the parent schema simply calls the ._run method of its children. To support encode without putting a lot of code into the encode function, each schema and action must support an internal ._decode method, but this would increase the bundle size for all schemas. Even if this functionality is not used.
but this would increase the bundle size for all schemas
I see. It's definitely a tradeoff. I understand valibot is optimized for bundle size so if you feel that adding such functionality would undo some of the effort done to reduce its footprint as much as possible then perhaps this capability isn't worth it and we can close this issue. We can still use this issue as future reference should users of valibot feel they would benefit from isomorphic schemas.
Perhaps my last question is still relevant though?
With the two schema approach, would we be able to differentiate at the type level the encoder and the decoder?
Thank you for your feedback! We can leave this open for now so developers can use this issue to discuss it. I will probably not investigate this further before v1, but it may be a feature we think about again for v2.
With the two schema approach, would we be able to differentiate at the type level the encoder and the decoder?
I am not quite sure what you mean. It is possible to infer both types. You can also determine the input and output type of the encoder based on the decoder. Here is an example:
import * as v from 'valibot';
const DateDecodeSchema = v.pipe(
v.string(),
v.isoDate(),
v.transform((input) => new Date(input))
);
type DateDecodeInput = v.InferInput<typeof DateDecodeSchema>;
type DateDecodeOutput = v.InferOutput<typeof DateDecodeSchema>;
const DateEncodeSchema = v.pipe(
v.date(),
v.transform((date) => date.toISOString())
) satisfies v.GenericSchema<DateDecodeOutput, DateDecodeInput>; // <--
function doStuff<
TDecoder extends v.GenericSchema,
TEncoder extends v.GenericSchema<
v.InferOutput<TDecoder>, // <--
v.InferInput<TDecoder> // <--
>,
>(decoder: TDecoder, encoder: TEncoder) {
// More code here
}
const result = doStuff(DateDecodeSchema, DateEncodeSchema);
I needed something like this for a recent project and was able to put together something that followed a similar API to Effects schema transform function.
function transformer<
TEncoded extends v.GenericSchema,
TDecoded extends v.GenericSchema,
>(
encoded: TEncoded,
decoded: TDecoded,
encoders: {
decode: (input: v.InferOutput<TEncoded>) => v.InferOutput<TDecoded>;
encode: (input: v.InferOutput<TDecoded>) => v.InferOutput<TEncoded>;
},
) {
return {
encoded,
decoded,
decode: v.pipe(encoded, v.transform(encoders.decode)),
encode: v.pipe(decoded, v.transform(encoders.encode)),
};
}
The transformer function takes the encoded and decoded schemas as the first two arguments and then an object containing decode and encode transformation function as the third argument e.g.
const DateSchema = transformer(
v.pipe(v.string(), v.isoDateTime()),
v.date(),
{
decode: (isoString) => new Date(string),
encode: (date) => date.toISOString(),
}
);
You can then use the decode and encode properties returned by the function when parsing:
const decodedDate = v.parse(Schema.decode, '2025-05-14T06:26');
These properties along with the encoded and decoded properties can be used within other transformers as well e.g.
const SchemaEncoded = v.object({
title: v.string(),
createdAt: DateSchema.encoded,
});
const SchemaDecoded = v.object({
title: v.string(),
createdAt: DateSchema.decoded,
});
const Schema = transformer(
SchemaEncoded,
SchemaDecoded,
{
decode: ({ title, createdAt }) => {
return {
title,
createdAt: v.parse(DateSchema.decode, createdAt),
};
},
encode: ({ title, createdAt }) => {
return {
title,
createdAt: v.parse(DateSchema.encode, createdAt),
};
},
}
);
Interested to hear any thoughts on this. It would be nice to not have to manually parse nested bi-directional schemas.
P.S. I think transformer is a teribble name that causes confusion with transform, but I couldn't think of anyhting else at the time.
oooh this looks actually pretty nice & simple π
how about transcoder or schemaSerDe π
Maybe this could be abstracted away by putting the encode/decode methods into v.metadata and have another helper that recursively checks all fields and uses them if available π€ not sure if something like this would bloat the core.
What I like about this apart from going both directions for storage/json etc is that I could have "policies" encoded in the schema e.g. for internal fields that should not be send to the client etc π€
We can get deep encode/decode working pretty easily if we add data to config:
import type { BaseIssue, BaseSchema, InferInput, InferIssue, InferOutput } from 'valibot'
import { _getStandardProps, _joinExpects } from 'valibot'
export interface BidirectionalSchema<
Encode extends BaseSchema<unknown, unknown, BaseIssue<unknown>>,
Decode extends BaseSchema<unknown, unknown, BaseIssue<unknown>>,
> extends BaseSchema<InferInput<Encode> | InferInput<Decode>, InferOutput<Encode | Decode>, InferIssue<Encode | Decode>> {
type: 'bidir'
reference: typeof bidir
readonly encode: Encode
readonly decode: Decode
}
export function bidir<
Encode extends BaseSchema<unknown, unknown, BaseIssue<unknown>>,
Decode extends BaseSchema<unknown, unknown, BaseIssue<unknown>>,
>(
decode: Decode, encode: Encode,
): BidirectionalSchema<Encode, Decode> {
return {
type: 'bidir',
kind: 'schema',
reference: bidir,
expects: _joinExpects([decode.expects, encode.expects], '|'),
async: false,
encode,
decode,
get '~standard'() {
return _getStandardProps(this)
},
'~run'(dataset, config) {
const mode = ('mode' in config && config.mode === 'encode') ? 'encode' : 'decode'
return this[mode]['~run'](dataset, config)
},
};
}
This doesn't work with the existing parse implementation since that doesn't pass through custom config fields, but we can work around that by defining our own encode/decode functions:
import type { BaseSchema, Config, InferIssue, InferOutput, BaseIssue } from 'valibot'
import { getGlobalConfig, ValiError } from "valibot";
export function encode<
const TSchema extends BaseSchema<unknown, unknown, BaseIssue<unknown>>
>(
schema: TSchema,
input: unknown,
config?: Config<InferIssue<TSchema>>
): InferOutput<TSchema> {
const c: Config<InferIssue<TSchema>> & { mode?: 'encode' | 'decode'; } = getGlobalConfig(config);
c.mode = 'encode';
const dataset = schema['~run']({ value: input }, c);
if (dataset.issues) {
throw new ValiError(dataset.issues);
}
return dataset.value;
}
export function decode<
const TSchema extends BaseSchema<unknown, unknown, BaseIssue<unknown>>,
>(
schema: TSchema,
input: unknown,
config?: Config<InferIssue<TSchema>>,
): InferOutput<TSchema> {
const c: Config<InferIssue<TSchema>> & { mode?: 'encode' | 'decode' } = getGlobalConfig(config)
c.mode = 'decode'
const dataset = schema['~run']({ value: input }, c)
if (dataset.issues) {
throw new ValiError(dataset.issues)
}
return dataset.value
}
Then, we can do
const schema = v.object({
foo: bidir(
v.pipe(v.string(), v.transform((s) => parseInt(s, 10))),
v.pipe(v.number(), v.transform((n) => n.toString())),
),
})
Deno.test(function testEncode() {
const value = { foo: 1 }
const result = encode(schema, value);
assertEquals(result, { foo: '1' });
})
Deno.test(function testDecode() {
const value = { foo: '1' }
const result = decode(schema, value);
assertEquals(result, { foo: 1 });
})
Although if we do this, the output type isn't very useful since it's a union of the encoded/decoded types, and I guess error messages might be weird sometimes since expects doesn't know whether we're encoding or decoding.
I don't think there's a good way to get better types without adding a significant amount of code/types, but it might be good enough to assume that the 'decoded' type is the only output type that TypeScript needs to know about. (If the goal is the serialize data, the other type will just be sent somewhere else, so it's type doesn't matter.)
This could also let us generalize to >2 transformations without too much effort, e.g.,
multipleTransforms({
base: v.instance(ZonedDateTime),
iso: {
to: (zdt) => zdt.toString(),
from: { type: v.string(), transform: parseZonedDateTime },
},
js: {
to: (zdt) => zdt.toDate(),
from: { type: v.date(), transform: (date) => fromDate(date, getLocalTimeZone()) },
}
})
Please give a π to this comment if we should consider first-class encode/decode support for Valibot v2. Thanks @javangriff for your very simple and clean implementation!
FYI this feature was recently added in Zod 4.1, so it would be great to have an equivalent in Valibot
https://zod.dev/codecs
About the terminology basically every library I know that implements this patterns calls the two directions Encoder/Decoder and the union of the two a Codec. Zod does is, but there's a lot of prior art like io-ts, and also other languages's libraries like scodecs and Circe in Scala.
I would recommend sticking to the same name since it seems quite ubiquitous
I will investigate if we can ship a similar feature as a minor release or if it requires a major version bump.