valibot icon indicating copy to clipboard operation
valibot copied to clipboard

fix(to-json-schema): translate `v.file()` schema

Open lukeed opened this issue 8 months ago • 15 comments

A v.file() can/should be translated to the JSON schema:

type: string
format: binary

See https://swagger.io/docs/specification/v3_0/describing-request-body/file-upload/ for OpenAPI support.

I initally thought this was an issue with hono-openapi's translation, but tracked it to here instead.

lukeed avatar Mar 28 '25 16:03 lukeed

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
valibot ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 7, 2025 4:07pm

vercel[bot] avatar Mar 28 '25 16:03 vercel[bot]

Thanks for creating this PR! I am not sure if we should convert the file schema to a binary string in JSON Schema because the underlying data type is different. Maybe we should provide a binary action for strings to enable this conversion. Looking forward to your feedback!

const Schema = v.pipe(v.string(), v.binary()); 

fabian-hiller avatar Mar 29 '25 02:03 fabian-hiller

Open in StackBlitz

npm i https://pkg.pr.new/valibot@1119

commit: b3065df

pkg-pr-new[bot] avatar Mar 29 '25 02:03 pkg-pr-new[bot]

The OpenAPI spec says that files need to be string/binary. Because that’s what they are. https://swagger.io/docs/specification/v3_0/describing-request-body/file-upload/

The translation is going from one representation to another. The rest of the ecosystem on top of valibot is already using v.file() for formdata validation, which works after req.formData() parsing…but that needs to be recorded and described in OpenAPI format too.

lukeed avatar Mar 29 '25 05:03 lukeed

Would you also convert blob to a binary string and date to a date string? In the past, I have decided that people should use v.pipe(v.string(), v.isoTimestamp()) or one of our other ios... date actions. But I am happy to discuss it and improve it in the long run.

fabian-hiller avatar Mar 29 '25 22:03 fabian-hiller

Out of curiosity, what are the disadvantages of defining a binary string as v.pipe(v.string(), v.binary())?

fabian-hiller avatar Mar 29 '25 22:03 fabian-hiller

When forms are uploaded, they are encoded as binary strings. Thru HTTP headers, those strings are known to be 1) sliced by a boundary 2) some are indicated as being files 3) of a certain mime type

On the JS server, req.formData() will take all this information and parse it accordingly, turning files into File instances. This is where your v.file() will pass/validate the value(s).

But before getting to the server, OpenAPI clients need to know that whether or not file(s) can be sent. In OpenAPI definitions (thru JSON schema) this is represented as { type: "string", format: "binary" } because, again, this is describing the transport. AFAICT, format: "binary" isnt part of the spec itself, but its been the pattern/consensus.

However, there is now a draft-9 spec change that incorporates the (still optional) contentMediaType and contentEncoding properties to denote what kind of content the uploaded string is.

If I were you, I would do this:

v.file();
//-> {
//->   "type": "string",
//->   "format": "binary",
//-> }

// ImageSchema
v.pipe(
  v.file('Please select an image file.'),
  v.mimeType(['image/jpeg', 'image/png'], 'Please select a JPEG or PNG file.'),
  v.maxSize(1024 * 1024 * 10, 'Please select a file smaller than 10 MB.')
);
//-> {
//->   "oneOf": [{
//->     "type": "string",
//->     "format": "binary",
//->     "contentMediaType": "image/jpeg",
//->     "maxLength": 1024 * 1024 * 10
//->    }, {
//->     "type": "string",
//->     "format": "binary",
//->     "contentMediaType": "image/png",
//->     "maxLength": 1024 * 1024 * 10
//->    }
//-> }

You can't force/assume a contentEncoding until the developer has specified one. AFAICT you dont have an API to capture this information yet, so I didnt include an example for it.


what are the disadvantages of defining a binary string as v.pipe(v.string(), v.binary())

In the example above, once user req.formData()s, the valibot schema will always fail since it's no longer checking that the value instanceof File. It's information loss.

Would you also convert blob to a binary string

Blobs can't be uploaded. It's a JS specific API and isn't HTTP transferrable. At best, JSON schema could use a string/binary/content* combination for this, but there's no API that would automatically convert the uploaded bytes into a Blob. The constructor is lost (and meaningless) over the wire, so there's no reason to try to preserve the conversion. This is not true of File and why it should be different/fixed.

Would you also convert ... date to a date string

This could perhaps be an option, but I wouldn't. Like Blob, there's nothing in JS that would auto-convert the date's string/number representation back into a Date instance. The value would just stay as a string/number until the dev (or some abstraction layer) converted it

lukeed avatar Mar 29 '25 23:03 lukeed

Thank you for taking the time to explain it so clearly! I really appreciate it! I don't have a lot of time right now, but I will try to review and merge within the next few days and publish a new version.

fabian-hiller avatar Mar 31 '25 01:03 fabian-hiller

JSON Schema does not have binary format, this is unique to OpenAPI. Perhaps there should be some toggle/flag, to indicate that transformation is for OpenAPI declaration :?

muningis avatar Apr 07 '25 21:04 muningis

Implementations MAY support custom format attributes. https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-validation-00#rfc.section.7.2.3

You can use anything in a format. There are common formats that have been added thru general agreement, but the contents of format attribute are not validated by the spec itself.

lukeed avatar Apr 07 '25 23:04 lukeed

Does File has a specific encoding when represented as a string that should be specified via .contentEncoding in JSON Schema?

fabian-hiller avatar Apr 08 '25 03:04 fabian-hiller

@fabian-hiller sorry I don’t understand the question

lukeed avatar Apr 08 '25 03:04 lukeed

Sorry, my bad. I fixed my previous comment.

fabian-hiller avatar Apr 09 '25 03:04 fabian-hiller

Ah. No, you can’t add any more than what developer specifies. I’ll point to the above example snippet again. On its own, you can only translate v.file() to string/format=binary

lukeed avatar Apr 09 '25 07:04 lukeed

I am still not sure what to do. Both Zod v4 and ArkType also throw an error when converting a File schema to JSON Schema, and it feels wrong to switch the type when converting to JSON Schema just because of OpenAPI and how files are transported over HTTP. But I understand your arguments and I don't want people to be blocked by the current behavior if this is a common case.

fabian-hiller avatar Apr 13 '25 03:04 fabian-hiller

I just added 3 new configs with overrideSchema, overrideAction and overrideRef. overrideSchema can be used to change the output for v.file() in the meantime.

fabian-hiller avatar May 17 '25 05:05 fabian-hiller

I may merge this pull request with a note about the type mismatch in the docs. However, before doing so, I need to investigate whether we can also support minSize and maxSize, as well as mimeType, when converting to a binary string in JSON Schema.

fabian-hiller avatar Jun 01 '25 15:06 fabian-hiller

Have you reached a consensus about this issue? I'm currently facing the same problem with openapi spec generation using the file() schema. I was starting to propose the same change. Could this also be implemented in hono-openapi valibot using the overrideSchema option, perhaps?

https://github.com/rhinobase/hono-openapi/blob/main/packages/core/src/valibot.ts#L87

mainqueg avatar Jul 23 '25 21:07 mainqueg

Not yet. I see the benefit but it also seems somewhat OpenAPI specific. I would probably recommend Hono and others to override it for now if they are generating it for OpenAPI.

fabian-hiller avatar Jul 27 '25 01:07 fabian-hiller

I don't use this anymore, but posting a zod docs link here for reference. They do exactly what I described (because that is the standard).

lukeed avatar Aug 22 '25 06:08 lukeed

I think whether this is correct or not depends on the context. In your use case, I agree. This would be the expected behavior because JSON does not support File and this is the best way to translate it somehow. However, in other cases where you expect a file, the right thing to do is show an error because JSON Schema cannot represent File.

I was about to follow Zod's implementation, but I'm not happy with how they translate.min() and .max() because the string length differs from the byte size, and their translation is probably error-prone.

fabian-hiller avatar Aug 24 '25 15:08 fabian-hiller

I plan to release a bigger update to our toJsonSchema function soon. I may come back to this issue in a few weeks.

fabian-hiller avatar Aug 24 '25 15:08 fabian-hiller