undici icon indicating copy to clipboard operation
undici copied to clipboard

Implement multipart parsing

Open ronag opened this issue 3 years ago β€’ 32 comments

Our extractBody method is missing support for parsing multipart/form-data responses.

ronag avatar Aug 17 '21 13:08 ronag

I'm not sure who/why someone would send multipart/form-data from a server so would appreciate some requests on this. Doesn't seem to be a very useful feature.

ronag avatar Aug 17 '21 13:08 ronag

Hey, it seems like node-fetch is going to have multipart parser soon, if this PR gets merged. Maybe this would be helpful for you somehow?

octet-stream avatar Oct 14 '21 20:10 octet-stream

I think it seems useful, i mean https://www.npmjs.com/package/form-data has 4.5M download / week, + it's a spec'ed thing

I have long been wanting to have a formData method in node-fetch

jimmywarting avatar Oct 15 '21 09:10 jimmywarting

Hello. Telegram bot api accept only FormData for files https://core.telegram.org/bots/api#inputfile

MrWaip avatar Dec 22 '21 13:12 MrWaip

I'm not sure who/why someone would send multipart/form-data from a server so would appreciate some requests on this. Doesn't seem to be a very useful feature.

Correct me if I'm wrong, but isn't the bigger use case the server parsing multipart form data from the client? The example from the node-fetch PR iml is:

import { Response } from 'node-fetch'

http.createServer(async function (req, res) {
  const formData = await new Response(req, { headers: req.headers }).formData()
  const file = formData.get('avatar')
  
  var arrayBuffer = await file.arrayBuffer()
  var text = await file.text()
  var whatwgReadableStream = file.stream()
});

It would be great if undici had the same support for multipart form data.

blittle avatar Jan 19 '22 01:01 blittle

the example should actually be something like this:

http.createServer(async function (req, res) {
  const url = `${req.protocol}://${req.get('host')}${req.originalUrl}`
  const request new Request(url, { body: req, method: req.method, headers: req.headers })
  const formdata = request.formData()
  const file = formData.get('avatar')
  
  var arrayBuffer = await file.arrayBuffer()
  var text = await file.text()
  var whatwgReadableStream = file.stream()
});

But it's easier/quicker to construct a Response that don't need any url or extra options the idea of having a formdata parsing and shipping the body.formData() comes from service workers a service worker is meant to intercept the request and be able to parse the payload when needed and insert any missing fields like a authentication key or something

I'm not sure who/why someone would send multipart/form-data from a server so would appreciate some requests on this. Doesn't seem to be a very useful feature.

It's not about responding with a multipart/form-data response from a server like in: fetch(url).then(res => {res.formData()}) cuz again that would be weird if a server responded with a multipart/form-data. It's rather more about parsing incoming request to the http server which alot of folk use nodejs for.

I have long been wishing for a body.formData() but I never wanted to have that kind of functionality for getting such responses back from the server. it was b/c I wished for the servers IncommingRequest to mimic more like a service worker request and behaving more like the Request class... Where parsing IncommingRequest with body.formData() is essential so you wouldn't need busboy, formidable or any other library that you also need to learn how it works.

But again for a server to respond with a multipart/form-data isn't all that bad. It's a grate thing to use for sending multiple files back to the client now that fetch also has .formData(). And I have used it once in the web for transferering files as well.

// server.js
http.createServer(async function (req, res) {
  const formData = new FormData()
  fd.set('/uploads/logo.png', new File(...))
  fd.set('/uploads/movie-introduction.mp4', new File(...))
  const {body} = new Response(formData)
  stream.Readable.from(body).pipe(res)
})

// client.js
const res = await fetch('/get-files')
const fd = await res.formData()
const files = [...fd]
// use the files for something

This is much easier than bringing in a zip for packing and unpacking on both the server and on the client if you for any reason needs to get multipe files from the server. I have done this once before myself. But this is unherd of cuz never once have we had xhr.responseType = 'FormData' and not many are using service worker to intercept the request payload. this trully is a missing functionallity that we yet haven't taped into yet if there is such a thing to parse formdata in the other direction as well (from server to client).


I also added it cuz it's also part of the spec which might be the biggest reason to impl it so folk get what they expect

jimmywarting avatar Jan 19 '22 12:01 jimmywarting

@jimmywarting: I wouldn't mind adding support for it. Would you be ok if someone port https://github.com/node-fetch/node-fetch/blob/main/src/utils/multipart-parser.js & https://github.com/node-fetch/node-fetch/pull/1314/files over to undici? @mcollina wdyt?

ronag avatar Jan 19 '22 12:01 ronag

Sure, Be my guest take what u want. The source is based on a quite old version of formidable that i patched quite a bit in 2017 (to get rid of some core nodejs parts like Buffer and Streams) so that it would make a friendlier browser version for github/fetch polyfill but never got merged. it was patched even more when folks reviewed my PR to node-fetch for some performences improvements

also take a look into busboy if their new parser is any faster. i haven't tested it against node-fetch new parser

I think the owner of busboy said that it dose some more extra stuff that don't co-here with the FormData parser spec like supporting non-UTF8 encodings


also have a other question... would this formData() parser make it into stream consumers as well somehow?

import {formData} from 'node:stream/consumers';
const fd = await formData(iterable, {boundary: req.boundary})
fd // is instancesof FormData

Think it would also be cool to have something more low level like

import {formData, blob} from 'node:stream/consumers'

// returns another asyncIterable (maybe the stream consumer name `formData` is the wrong name for it)
const asyncIterable = formData(req, {boundary: req.boundary})

for await (const [name, entry, meta] of asyncIterable) {
  if (typeof entry === 'string') {
    // it's normal regular string, should this also be streamable, what if the payload is really large?
  } else {
    // it's either a File class or something that is streamable/iterable as well.
    const blob = await blob(entry, { type: meta.contentType || meta.type }) // wish for type option to exist here (it's missing)
    new File([blob], meta.fileName, { type: blob.type })
  }
}

another solution could be to treat every entry the same and the meta only changes depending on the type of file vs normal field:

import {multipart, blob, text} from 'node:stream/consumers'

// returns another asyncIterable
const asyncIterable = multipart(req, {boundary: req.boundary})

for await (const [entry, meta] of asyncIterable) {
  const headers = meta.headers // Headers is a instance of fetch `Headers` class
  const type = headers.get('content-type')
  const name = meta.fieldName 

  // parses the filename and the utf8 filename variant as well out of content-disposition header
  const fileName = meta.fileName // undefined || string
  
  const item = await (fileName ? blob(entry) : string(entry))
}

think i like this☝️ solution more then the previous 2 examples

jimmywarting avatar Jan 19 '22 13:01 jimmywarting

We'd like to have multipart support for Shopify Hydrogen, where our API route handlers are just functions that take undici Request objects. See https://github.com/Shopify/hydrogen/discussions/450

blittle avatar Jan 19 '22 14:01 blittle

I have a working parser that just needs tests written before a PR (no code used from other parsers which should help with license issues), although tests keep stalling tap (same issue as client-fetch.js which causes the current failures in the CI).

const server = createServer(async (req, res) => {
    const chunks = []
    for await (const chunk of req) chunks.push(chunk)

    const resp = new Response(Buffer.concat(chunks), { headers: req.headers })
    const form = await resp.formData()
    
    const same = form.get('c')
    const hope = readFileSync('./index.html', 'utf-8')

    assert(same.name, 'index.html')
    assert(hope === await same.text())
    assert(form.get('a') === 'b')
    
    res.end('lol')
    server.close()
}).listen(0)

const form = new FormData()
form.set('a', 'b')
form.set('c', new Blob([readFileSync('./index.html')]), 'index.html')

// send POST req with form body here...

I'm just worried as it's less than 100 lines that there may be edge cases I've missed, although I've hopefully made it fool-proof πŸ˜….

KhafraDev avatar Jan 21 '22 18:01 KhafraDev

It needs to be a streaming client - the code in busboy shows more or less how complex that is. Might be interesting to try implementing it using async iterators

mcollina avatar Jan 21 '22 19:01 mcollina

Another thing i have been thinking about doing in node-fetch when parsing FormData and encountering very large files would somehow be to offload the data into some temporary file instead of allocating everything into memory... that might be something that's next on my agenda... that's when FileLike object's would come into play... I really wish that this got fixed in NodeJS

  • https://github.com/nodejs/node/issues/37340
  • https://github.com/nodejs/node/issues/37338 (I think this is already done?)
  • https://github.com/nodejs/node/issues/39015
  • ...and that blob.stream() was a bit more stream friendly - and didn't make use of .arrayBuffer() in ReadableStream.start()

jimmywarting avatar Jan 21 '22 20:01 jimmywarting

It needs to be a streaming client - the code in busboy shows more or less how complex that is. Might be interesting to try implementing it using async iterators

I added the ability to use async iterators which each yield an object close to

{
    headers: [Object: null prototype] {
      'Content-Disposition': 'form-data',
      name: 'c',
      filename: 'index.html',
      'Content-Type': 'application/octet-stream'
    },
    content: '... file contents...'
  }

although I loop over the entire buffer body before parsing so I need to combine that logic πŸ€” & I'm unsure which of those values are headers so I just lumped them together

KhafraDev avatar Jan 21 '22 20:01 KhafraDev

Buffering the full body is not going to work well for Node.js unfortunately, it can end up using quite a lot of memory.

mcollina avatar Jan 21 '22 21:01 mcollina

I agree with @mcollina I think the content: '... file contents...' should be streamable as well... and it should most certainly not be a string in the case where you upload binary data, A async iterator that yields Uint8Array would be preferred

jimmywarting avatar Jan 21 '22 21:01 jimmywarting

I think I can do that, although using formidable/busboy/dicer, etc, is probably a better choice in this situation. The fetch spec does require parsing formdata responses though

KhafraDev avatar Jan 21 '22 21:01 KhafraDev

I'll add SvelteKit to the list of frameworks that would definitely benefit from this β€” our best workaround is to monkey-patch the undici Request object's formData method, and create a node-fetch Request with the raw data in order to use its formData method: https://github.com/sveltejs/kit/pull/5292

Obviously this is sub-optimal since it doesn't permit streaming parsing, and it adds bloat to apps, but it'll do for the time being!

Rich-Harris avatar Jun 28 '22 01:06 Rich-Harris

Essentially we need to support await response.formData(). Is that correct? @KhafraDev you did a lot of work for the spec compatiblity, would you like to take this on?

I'm ok to implement this here, even if it would have some major caveats for big files.

mcollina avatar Jun 28 '22 09:06 mcollina

I'm ok to implement this here, even if it would have some major caveats for big files.

Would it be OK to write files to a tmp directory and using finalizers and then remove them once it's no longer referenced? I kind of did that in this PR: https://github.com/node-fetch/fetch-blob/pull/137

this feature would kindof be blocked by Getting Files backed up by the filesystem

jimmywarting avatar Jun 28 '22 09:06 jimmywarting

Is it safe to assume availability of tmp directory in cloud environments? Immutable containers are not widely used? Wonder if it could have a fallback or an override option.

kibertoad avatar Jun 28 '22 09:06 kibertoad

Would it be OK to write files to a tmp directory and using finalizers and then remove them once it's no longer referenced? I kind of did that in this PR: https://github.com/node-fetch/fetch-blob/pull/137

I would prefer to not implement that in here but rather provide some hook that some external module can plug in to implement that behavior. There are likely so many different ways this can be done that I do not want to support them all in this module.

One note: you want the files to be removed anyway on process end, take a look at https://github.com/mcollina/on-exit-leak-free as it allows you to have a finalizer that is run on process exit.

this feature would kindof be blocked by Getting Files backed up by the filesystem

Indeed, that should be implemented first.

mcollina avatar Jun 28 '22 09:06 mcollina

Question: Could we depend on the busboy package or do you wish to write something own that you have more control over and not having to worry much about any other license?

busboy do really seems to be the fastest parser out there in the NPM user land for parsing multipart data...? it dose seem to also support more (non utf8) encoded formats as well.

jimmywarting avatar Jul 14 '22 11:07 jimmywarting

busboy is pretty much a golden standard, and it was resurrected recently, so is a great choice again. alternatively you may go for @fastify/busboy fork if you want more control.

kibertoad avatar Jul 14 '22 11:07 kibertoad

Question: Could we depend on the busboy package or do you wish to write something own that you have more control over and not having to worry much about any other license?

I'm ok to use busboy for this. I would prefer we keep being dependency-less but re-implementing multipart parsing from scratch would slow us too much.

mcollina avatar Jul 14 '22 12:07 mcollina

I'm not sure who/why someone would send multipart/form-data from a server so would appreciate some requests on this. Doesn't seem to be a very useful feature.

Sveltekit needs it...

irishburlybear avatar Jul 26 '22 17:07 irishburlybear

Cloudflare CLI Wrangler needs it πŸ˜„

JacobMGEvans avatar Aug 08 '22 22:08 JacobMGEvans

PR would be very welcome!

kibertoad avatar Aug 08 '22 22:08 kibertoad

I'm happy to do a few design calls about this if an individual or a company is serious in implementing it.

mcollina avatar Aug 09 '22 06:08 mcollina

Shopify Hydrogen is also a +1 on this.

blittle avatar Aug 09 '22 16:08 blittle

I think we can establish that quite many ppl want a formData encoder right now, so i don't think any +1 me too is not necessary anymore, there is already compelling evidence that it's needed/wanted and that we should definitely implement it.

if anything +1 are only becoming more annoying and spams everyone who is subscribed to this topic it also becomes more annoying to scan trough this issue for comments that really matters and helps towards solving this issue. so i think unnecessary +1 comments should be deleted or flagged as off topic and become hidden...

jimmywarting avatar Aug 09 '22 16:08 jimmywarting