valibot
valibot copied to clipboard
fix(to-json-schema): translate `v.file()` schema
A v.file() can/should be translated to the JSON schema:
type: string
format: binary
See https://swagger.io/docs/specification/v3_0/describing-request-body/file-upload/ for OpenAPI support.
I initally thought this was an issue with hono-openapi's translation, but tracked it to here instead.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| valibot | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Apr 7, 2025 4:07pm |
Thanks for creating this PR! I am not sure if we should convert the file schema to a binary string in JSON Schema because the underlying data type is different. Maybe we should provide a binary action for strings to enable this conversion. Looking forward to your feedback!
const Schema = v.pipe(v.string(), v.binary());
The OpenAPI spec says that files need to be string/binary. Because that’s what they are. https://swagger.io/docs/specification/v3_0/describing-request-body/file-upload/
The translation is going from one representation to another. The rest of the ecosystem on top of valibot is already using v.file() for formdata validation, which works after req.formData() parsing…but that needs to be recorded and described in OpenAPI format too.
Would you also convert blob to a binary string and date to a date string? In the past, I have decided that people should use v.pipe(v.string(), v.isoTimestamp()) or one of our other ios... date actions. But I am happy to discuss it and improve it in the long run.
Out of curiosity, what are the disadvantages of defining a binary string as v.pipe(v.string(), v.binary())?
When forms are uploaded, they are encoded as binary strings. Thru HTTP headers, those strings are known to be 1) sliced by a boundary 2) some are indicated as being files 3) of a certain mime type
On the JS server, req.formData() will take all this information and parse it accordingly, turning files into File instances. This is where your v.file() will pass/validate the value(s).
But before getting to the server, OpenAPI clients need to know that whether or not file(s) can be sent. In OpenAPI definitions (thru JSON schema) this is represented as { type: "string", format: "binary" } because, again, this is describing the transport. AFAICT, format: "binary" isnt part of the spec itself, but its been the pattern/consensus.
However, there is now a draft-9 spec change that incorporates the (still optional) contentMediaType and contentEncoding properties to denote what kind of content the uploaded string is.
If I were you, I would do this:
v.file();
//-> {
//-> "type": "string",
//-> "format": "binary",
//-> }
// ImageSchema
v.pipe(
v.file('Please select an image file.'),
v.mimeType(['image/jpeg', 'image/png'], 'Please select a JPEG or PNG file.'),
v.maxSize(1024 * 1024 * 10, 'Please select a file smaller than 10 MB.')
);
//-> {
//-> "oneOf": [{
//-> "type": "string",
//-> "format": "binary",
//-> "contentMediaType": "image/jpeg",
//-> "maxLength": 1024 * 1024 * 10
//-> }, {
//-> "type": "string",
//-> "format": "binary",
//-> "contentMediaType": "image/png",
//-> "maxLength": 1024 * 1024 * 10
//-> }
//-> }
You can't force/assume a contentEncoding until the developer has specified one. AFAICT you dont have an API to capture this information yet, so I didnt include an example for it.
what are the disadvantages of defining a binary string as v.pipe(v.string(), v.binary())
In the example above, once user req.formData()s, the valibot schema will always fail since it's no longer checking that the value instanceof File. It's information loss.
Would you also convert blob to a binary string
Blobs can't be uploaded. It's a JS specific API and isn't HTTP transferrable. At best, JSON schema could use a string/binary/content* combination for this, but there's no API that would automatically convert the uploaded bytes into a Blob. The constructor is lost (and meaningless) over the wire, so there's no reason to try to preserve the conversion. This is not true of File and why it should be different/fixed.
Would you also convert ... date to a date string
This could perhaps be an option, but I wouldn't. Like Blob, there's nothing in JS that would auto-convert the date's string/number representation back into a Date instance. The value would just stay as a string/number until the dev (or some abstraction layer) converted it
Thank you for taking the time to explain it so clearly! I really appreciate it! I don't have a lot of time right now, but I will try to review and merge within the next few days and publish a new version.
JSON Schema does not have binary format, this is unique to OpenAPI. Perhaps there should be some toggle/flag, to indicate that transformation is for OpenAPI declaration :?
Implementations MAY support custom format attributes. https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-validation-00#rfc.section.7.2.3
You can use anything in a format. There are common formats that have been added thru general agreement, but the contents of format attribute are not validated by the spec itself.
Does File has a specific encoding when represented as a string that should be specified via .contentEncoding in JSON Schema?
@fabian-hiller sorry I don’t understand the question
Sorry, my bad. I fixed my previous comment.
Ah. No, you can’t add any more than what developer specifies. I’ll point to the above example snippet again. On its own, you can only translate v.file() to string/format=binary
I am still not sure what to do. Both Zod v4 and ArkType also throw an error when converting a File schema to JSON Schema, and it feels wrong to switch the type when converting to JSON Schema just because of OpenAPI and how files are transported over HTTP. But I understand your arguments and I don't want people to be blocked by the current behavior if this is a common case.
I just added 3 new configs with overrideSchema, overrideAction and overrideRef. overrideSchema can be used to change the output for v.file() in the meantime.
I may merge this pull request with a note about the type mismatch in the docs. However, before doing so, I need to investigate whether we can also support minSize and maxSize, as well as mimeType, when converting to a binary string in JSON Schema.
Have you reached a consensus about this issue? I'm currently facing the same problem with openapi spec generation using the file() schema. I was starting to propose the same change. Could this also be implemented in hono-openapi valibot using the overrideSchema option, perhaps?
https://github.com/rhinobase/hono-openapi/blob/main/packages/core/src/valibot.ts#L87
Not yet. I see the benefit but it also seems somewhat OpenAPI specific. I would probably recommend Hono and others to override it for now if they are generating it for OpenAPI.
I don't use this anymore, but posting a zod docs link here for reference. They do exactly what I described (because that is the standard).
I think whether this is correct or not depends on the context. In your use case, I agree. This would be the expected behavior because JSON does not support File and this is the best way to translate it somehow. However, in other cases where you expect a file, the right thing to do is show an error because JSON Schema cannot represent File.
I was about to follow Zod's implementation, but I'm not happy with how they translate.min() and .max() because the string length differs from the byte size, and their translation is probably error-prone.
I plan to release a bigger update to our toJsonSchema function soon. I may come back to this issue in a few weeks.