undici
undici copied to clipboard
Implement multipart parsing
Our extractBody
method is missing support for parsing multipart/form-data
responses.
I'm not sure who/why someone would send multipart/form-data from a server so would appreciate some requests on this. Doesn't seem to be a very useful feature.
Hey, it seems like node-fetch is going to have multipart parser soon, if this PR gets merged. Maybe this would be helpful for you somehow?
I think it seems useful, i mean https://www.npmjs.com/package/form-data has 4.5M download / week, + it's a spec'ed thing
I have long been wanting to have a formData method in node-fetch
Hello. Telegram bot api accept only FormData for files https://core.telegram.org/bots/api#inputfile
I'm not sure who/why someone would send multipart/form-data from a server so would appreciate some requests on this. Doesn't seem to be a very useful feature.
Correct me if I'm wrong, but isn't the bigger use case the server parsing multipart form data from the client? The example from the node-fetch PR iml is:
import { Response } from 'node-fetch'
http.createServer(async function (req, res) {
const formData = await new Response(req, { headers: req.headers }).formData()
const file = formData.get('avatar')
var arrayBuffer = await file.arrayBuffer()
var text = await file.text()
var whatwgReadableStream = file.stream()
});
It would be great if undici had the same support for multipart form data.
the example should actually be something like this:
http.createServer(async function (req, res) {
const url = `${req.protocol}://${req.get('host')}${req.originalUrl}`
const request new Request(url, { body: req, method: req.method, headers: req.headers })
const formdata = request.formData()
const file = formData.get('avatar')
var arrayBuffer = await file.arrayBuffer()
var text = await file.text()
var whatwgReadableStream = file.stream()
});
But it's easier/quicker to construct a Response that don't need any url or extra options
the idea of having a formdata parsing and shipping the body.formData()
comes from service workers
a service worker is meant to intercept the request and be able to parse the payload when needed and insert any missing fields like a authentication key or something
I'm not sure who/why someone would send multipart/form-data from a server so would appreciate some requests on this. Doesn't seem to be a very useful feature.
It's not about responding with a multipart/form-data response from a server like in: fetch(url).then(res => {res.formData()})
cuz again that would be weird if a server responded with a multipart/form-data. It's rather more about parsing incoming request to the http server which alot of folk use nodejs for.
I have long been wishing for a body.formData()
but I never wanted to have that kind of functionality for getting such responses back from the server. it was b/c I wished for the servers IncommingRequest to mimic more like a service worker request and behaving more like the Request
class... Where parsing IncommingRequest with body.formData()
is essential so you wouldn't need busboy, formidable or any other library that you also need to learn how it works.
But again for a server to respond with a multipart/form-data isn't all that bad. It's a grate thing to use for sending multiple files back to the client now that fetch also has .formData()
. And I have used it once in the web for transferering files as well.
// server.js
http.createServer(async function (req, res) {
const formData = new FormData()
fd.set('/uploads/logo.png', new File(...))
fd.set('/uploads/movie-introduction.mp4', new File(...))
const {body} = new Response(formData)
stream.Readable.from(body).pipe(res)
})
// client.js
const res = await fetch('/get-files')
const fd = await res.formData()
const files = [...fd]
// use the files for something
This is much easier than bringing in a zip for packing and unpacking on both the server and on the client if you for any reason needs to get multipe files from the server. I have done this once before myself. But this is unherd of cuz never once have we had xhr.responseType = 'FormData'
and not many are using service worker to intercept the request payload. this trully is a missing functionallity that we yet haven't taped into yet if there is such a thing to parse formdata in the other direction as well (from server to client).
I also added it cuz it's also part of the spec which might be the biggest reason to impl it so folk get what they expect
@jimmywarting: I wouldn't mind adding support for it. Would you be ok if someone port https://github.com/node-fetch/node-fetch/blob/main/src/utils/multipart-parser.js
& https://github.com/node-fetch/node-fetch/pull/1314/files over to undici? @mcollina wdyt?
Sure, Be my guest take what u want. The source is based on a quite old version of formidable that i patched quite a bit in 2017 (to get rid of some core nodejs parts like Buffer
and Streams
) so that it would make a friendlier browser version for github/fetch polyfill but never got merged. it was patched even more when folks reviewed my PR to node-fetch for some performences improvements
also take a look into busboy if their new parser is any faster. i haven't tested it against node-fetch new parser
I think the owner of busboy said that it dose some more extra stuff that don't co-here with the FormData parser spec like supporting non-UTF8 encodings
also have a other question... would this formData() parser make it into stream consumers as well somehow?
import {formData} from 'node:stream/consumers';
const fd = await formData(iterable, {boundary: req.boundary})
fd // is instancesof FormData
Think it would also be cool to have something more low level like
import {formData, blob} from 'node:stream/consumers'
// returns another asyncIterable (maybe the stream consumer name `formData` is the wrong name for it)
const asyncIterable = formData(req, {boundary: req.boundary})
for await (const [name, entry, meta] of asyncIterable) {
if (typeof entry === 'string') {
// it's normal regular string, should this also be streamable, what if the payload is really large?
} else {
// it's either a File class or something that is streamable/iterable as well.
const blob = await blob(entry, { type: meta.contentType || meta.type }) // wish for type option to exist here (it's missing)
new File([blob], meta.fileName, { type: blob.type })
}
}
another solution could be to treat every entry the same and the meta only changes depending on the type of file vs normal field:
import {multipart, blob, text} from 'node:stream/consumers'
// returns another asyncIterable
const asyncIterable = multipart(req, {boundary: req.boundary})
for await (const [entry, meta] of asyncIterable) {
const headers = meta.headers // Headers is a instance of fetch `Headers` class
const type = headers.get('content-type')
const name = meta.fieldName
// parses the filename and the utf8 filename variant as well out of content-disposition header
const fileName = meta.fileName // undefined || string
const item = await (fileName ? blob(entry) : string(entry))
}
think i like thisβοΈ solution more then the previous 2 examples
We'd like to have multipart support for Shopify Hydrogen, where our API route handlers are just functions that take undici Request
objects. See https://github.com/Shopify/hydrogen/discussions/450
I have a working parser that just needs tests written before a PR (no code used from other parsers which should help with license issues), although tests keep stalling tap (same issue as client-fetch.js
which causes the current failures in the CI).
const server = createServer(async (req, res) => {
const chunks = []
for await (const chunk of req) chunks.push(chunk)
const resp = new Response(Buffer.concat(chunks), { headers: req.headers })
const form = await resp.formData()
const same = form.get('c')
const hope = readFileSync('./index.html', 'utf-8')
assert(same.name, 'index.html')
assert(hope === await same.text())
assert(form.get('a') === 'b')
res.end('lol')
server.close()
}).listen(0)
const form = new FormData()
form.set('a', 'b')
form.set('c', new Blob([readFileSync('./index.html')]), 'index.html')
// send POST req with form body here...
I'm just worried as it's less than 100 lines that there may be edge cases I've missed, although I've hopefully made it fool-proof π .
It needs to be a streaming client - the code in busboy shows more or less how complex that is. Might be interesting to try implementing it using async iterators
Another thing i have been thinking about doing in node-fetch when parsing FormData and encountering very large files would somehow be to offload the data into some temporary file instead of allocating everything into memory... that might be something that's next on my agenda... that's when FileLike object's would come into play... I really wish that this got fixed in NodeJS
- https://github.com/nodejs/node/issues/37340
- https://github.com/nodejs/node/issues/37338 (I think this is already done?)
- https://github.com/nodejs/node/issues/39015
- ...and that
blob.stream()
was a bit more stream friendly - and didn't make use of.arrayBuffer()
in ReadableStream.start()
It needs to be a streaming client - the code in busboy shows more or less how complex that is. Might be interesting to try implementing it using async iterators
I added the ability to use async iterators which each yield an object close to
{
headers: [Object: null prototype] {
'Content-Disposition': 'form-data',
name: 'c',
filename: 'index.html',
'Content-Type': 'application/octet-stream'
},
content: '... file contents...'
}
although I loop over the entire buffer body before parsing so I need to combine that logic π€ & I'm unsure which of those values are headers so I just lumped them together
Buffering the full body is not going to work well for Node.js unfortunately, it can end up using quite a lot of memory.
I agree with @mcollina
I think the content: '... file contents...'
should be streamable as well... and it should most certainly not be a string in the case where you upload binary data, A async iterator that yields Uint8Array would be preferred
I think I can do that, although using formidable/busboy/dicer, etc, is probably a better choice in this situation. The fetch spec does require parsing formdata responses though
I'll add SvelteKit to the list of frameworks that would definitely benefit from this β our best workaround is to monkey-patch the undici Request
object's formData
method, and create a node-fetch Request
with the raw data in order to use its formData
method: https://github.com/sveltejs/kit/pull/5292
Obviously this is sub-optimal since it doesn't permit streaming parsing, and it adds bloat to apps, but it'll do for the time being!
Essentially we need to support await response.formData()
. Is that correct?
@KhafraDev you did a lot of work for the spec compatiblity, would you like to take this on?
I'm ok to implement this here, even if it would have some major caveats for big files.
I'm ok to implement this here, even if it would have some major caveats for big files.
Would it be OK to write files to a tmp
directory and using finalizers and then remove them once it's no longer referenced? I kind of did that in this PR: https://github.com/node-fetch/fetch-blob/pull/137
this feature would kindof be blocked by Getting Files backed up by the filesystem
Is it safe to assume availability of tmp directory in cloud environments? Immutable containers are not widely used? Wonder if it could have a fallback or an override option.
Would it be OK to write files to a tmp directory and using finalizers and then remove them once it's no longer referenced? I kind of did that in this PR: https://github.com/node-fetch/fetch-blob/pull/137
I would prefer to not implement that in here but rather provide some hook that some external module can plug in to implement that behavior. There are likely so many different ways this can be done that I do not want to support them all in this module.
One note: you want the files to be removed anyway on process end, take a look at https://github.com/mcollina/on-exit-leak-free as it allows you to have a finalizer that is run on process exit.
this feature would kindof be blocked by Getting Files backed up by the filesystem
Indeed, that should be implemented first.
Question: Could we depend on the busboy package or do you wish to write something own that you have more control over and not having to worry much about any other license?
busboy do really seems to be the fastest parser out there in the NPM user land for parsing multipart data...? it dose seem to also support more (non utf8) encoded formats as well.
busboy is pretty much a golden standard, and it was resurrected recently, so is a great choice again. alternatively you may go for @fastify/busboy fork if you want more control.
Question: Could we depend on the busboy package or do you wish to write something own that you have more control over and not having to worry much about any other license?
I'm ok to use busboy
for this. I would prefer we keep being dependency-less but re-implementing multipart parsing from scratch would slow us too much.
I'm not sure who/why someone would send multipart/form-data from a server so would appreciate some requests on this. Doesn't seem to be a very useful feature.
Sveltekit needs it...
Cloudflare CLI Wrangler needs it π
PR would be very welcome!
I'm happy to do a few design calls about this if an individual or a company is serious in implementing it.
Shopify Hydrogen is also a +1 on this.
I think we can establish that quite many ppl want a formData encoder right now, so i don't think any +1 me too is not necessary anymore, there is already compelling evidence that it's needed/wanted and that we should definitely implement it.
if anything +1 are only becoming more annoying and spams everyone who is subscribed to this topic it also becomes more annoying to scan trough this issue for comments that really matters and helps towards solving this issue. so i think unnecessary +1 comments should be deleted or flagged as off topic and become hidden...