fastify-multipart Disable accumulating files in memory while using attachFieldsToBody

Prerequisites

[X] I have written a descriptive issue title
[X] I have searched existing issues to ensure the feature has not already been requested

🚀 Feature Proposal

Allow a flag in options to disable accumulation of files in memory, and let the caller handle the files themselves

Currently, if you want to use attachFieldsToBody, you're forced to either allow the library to accumulate files in memory, or provide an onFile function... Which also forces you to process the file (whether by streaming or accumulating).

I'd like to use the attachFieldsToBody feature, but I don't want any of my files accumulated or processed before the api function, because I need to handle them separately later and do streaming.

Motivation

attachFieldsToBody is a useful feature, and I'd rather not have to re-implement it so I can disable the accumulation

Example

fastify.register(require('@fastify/multipart'), { attachFieldsToBody: 'keyValues', disableAccumulation: true })

Aug 30 '22 16:08 MolotovCherry

How would you implement this feature?

Aug 30 '22 17:08 mcollina

How would you implement this feature?

I think something like this is simple enough. fastify.register(require('@fastify/multipart'), { attachFieldsToBody: 'keyValues', disableAccumulation: true })

Aug 30 '22 17:08 MolotovCherry

Thanks for reporting! Would you like to send a Pull Request to address this issue? Remember to add unit tests.

Aug 30 '22 17:08 mcollina

Since this usage does not accumulate:

fastify.register(require('@fastify/multipart'), { attachFieldsToBody: true, onFile })

I would support this new usage instead of adding a new option:

fastify.register(require('@fastify/multipart'), { attachFieldsToBody: 'keyValues', onFile })

Sep 04 '22 07:09 Eomm

Since this usage does not accumulate:
fastify.register(require('@fastify/multipart'), { attachFieldsToBody: true, onFile })
I would support this new usage instead of adding a new option:
fastify.register(require('@fastify/multipart'), { attachFieldsToBody: 'keyValues', onFile })

We have a need to decide on how to process files by stream inside the API method itself. From what I understand, this usage requires the files be processed before it even reaches the API method.

Isn't this the case? Did I miss something, and it's possible to let the API method handle the file by stream even though an onFile callback is specified?

Sep 04 '22 08:09 MolotovCherry

Well you have somewhere to store the data. There is no rewind for these fileStreams. So you have to store them somewere if you want to still be able to process them.

Sep 04 '22 09:09 Uzlopak

Well you have somewhere to store the data. There is no rewind for these fileStreams. So you have to store them somewere if you want to still be able to process them.

This was my motivation for creating the issue. The case where you can't both use attachFieldsToBody and streams in the API method at the same time. (This is why I went with a disableAccumulation option)

Currently the only to use streams in this way is with attachFieldsToBody off

Sep 04 '22 12:09 MolotovCherry

I am really confused. What is what you want to achieve? You want to attach the fields to the body but on the other hand you want to stream the file content. Ok. So maybe you dont wantthat the files get attached to the body as strings? Is that what you want?

Sep 04 '22 13:09 Uzlopak

I am really confused. What is what you want to achieve? You want to attach the fields to the body but on the other hand you want to stream the file content. Ok. So maybe you dont wantthat the files get attached to the body as strings? Is that what you want?

Yeah, that's right. I'd like to have the fields attached to body, but don't want files handled at all. I want to directly stream the files myself in the API method (not before it reaches it)

Because I can't do both, I've had to re-implement attachFieldsToBody, but it would be much cleaner to let the library do this (after all, the option exists in the first place). This way I can use fastify schema validations on the API method, rather than having to manually validate afterwards inside the API method itself (which results in messier code)

Sep 04 '22 13:09 MolotovCherry

Well imho there is way to do tihs. You can not "stop" the stream. You have to store it somewhere. Imagine you have a form with three fields, one of them is the file. Now you upload it, and first you get the file, and after it you get the field. As long as you not handle the file stream you will not get the field. So you have to process the file stream.

The only solution I know is to store the file into the memory, or to store it into redis.

https://github.com/luin/ioredis/pull/1376

With RedisWritable, you could store the uploaded file into redis as onFile, and then in the handler, you can read it again with the RedisReadable.

Sep 04 '22 13:09 Uzlopak

Imagine you have a form with three fields, one of them is the file. Now you upload it, and first you get the file, and after it you get the field. As long as you not handle the file stream you will not get the field. So you have to process the file stream.

Yes, in that case you would have to handle the file first, but as long as you order the fields first and have the file field be the very last one, this should be fine

Sep 04 '22 13:09 MolotovCherry

Well I think it will not be possible to implement it.

Sep 04 '22 13:09 Uzlopak

For the time being I disabled attachFieldsToBody and attach fields in a loop like this:

const body = {}
for await (const part of request.parts()) {
    if (part.file) {
        await pump(part.file, fs.createWriteStream(part.filename)) // `const pump = util.promisify(pipeline)` near imports as in other examples from docs
    } else {
        body[part.fieldname] = part // TODO: Array handling
    }
}

What might go wrong if the async generator function .parts would assign fields like this internally?

Of course, request.body would only be properly available after request.parts() is iterated over, but: a) that seems intuitive to me b) that could be mentioned in the docs along the other quirks like attachFieldsToBody: 'keyValues' converting all files with buffer.toString() by default

Sep 06 '22 22:09 v4dkou

Maybe, we can custom the onFile option , process the file stream, store files at local disk, and then attach local paths of files instead of files to body.

Mar 16 '23 02:03 liudichen

To be honest... This is the implementers duty to handle the streams. and tbh there are so many ready to use stream Implementations... An implementer just needs to correctly pipe the file streams. I am strongly against storing data on the harddisk by default. I can already see, that then people complain that their harddisk is full and we did not implement logic to clean the files after they were processed. Then the next comes and asks why we dont stream it into redis or mongo gridFS, or, or, or...

Mar 17 '23 23:03 Uzlopak

To be honest... This is the implementers duty to handle the streams. and tbh there are so many ready to use stream Implementations... An implementer just needs to correctly pipe the file streams. I am strongly against storing data on the harddisk by default. I can already see, that then people complain that their harddisk is full and we did not implement logic to clean the files after they were processed. Then the next comes and asks why we dont stream it into redis or mongo gridFS, or, or, or...

yes. we can do this just by passing a custom onFile to save to a local disk.

const onFile =  (field, file, filename, encoding, mimetype, body) =>  {
  const fileData = []
  const lastFile = body[field][body[field].length - 1]
  file.on('data', data => { if (!lastFile.limit) { fileData.push(data) } })
  file.on('limit', () => { lastFile.limit = true })
  file.on('end', () => {
    if (!lastFile.limit) {
      const buffer = Buffer.concat(fileData)
      lastFile.size = buffer.length
      const dir = path.join(os.tmpdir(), 'fastify-multipart')
     if (!fs.existsSync(dir)) { fs.mkdirSync(dir) }
    const filepath = path.join(dir, toID() + path.extname(filename))
     try {
        fs.writeFileSync(filepath, buffer)
        lastFile.filepath = filepath
     } catch (error) {
        lastFile.error = 'error when write to disk'
     }
     delete lastFile.data      
    } else {
        delete lastFile.data
    }
  })
}

However, I suggest whether the onFile configuration item can add several parameters, such as the options ( registration), or add an array parameter to collect all files (called requestFiles, for example, by which we can easily get files or clean them, or maybe add the logic to clean them in the cleanRequestFiles func) to enhance the customization ability of the onFile configuration item

Mar 19 '23 12:03 liudichen

I have the exact same issue.

Here what I would like to achieve :

fastify.post("/publish", function(request, reply) {
  const body = request.body;
  body.mainImageFile.file
    .pipe(imageResizer(1000,1000))
    .pipe(cloudStorage("some/path/" + body.mainImageFile.filename"));
  body.otherImageFile.file
    .pipe(imageResizer(500,500))
    .pipe(cloudStorage("some/path/" + body.otherImageFile.filename"));

   // use other fields as string
   body.title  // title as string
   body.description  // description as string
   ...
}

I even think it should be the default behavior. Fields are named in the request formdata so i expect to retrieve all fields (file and non-file) by their name in the body.

the attachFieldsToBody flag have probably been created for this purpose. It is processing non-file fields as expected JSON files are loaded and parsed which is a OK behavior to my opinion, as its the most common usage of such input. Binary files are loaded in memory (in buffer) and set as the value (loosing contentType and filename metadatas).

I'm not sure it fits the usual need. I think binary files should be usable as stream and metadata should be available in value.

So I tried to do it with an onFile function:

// does not work
onFile: function(part) {
  part.value = part;
}

But this does not work as onFile must consume the stream in order to reach the next field or the end. The API method is only called when all files are consumed.

Jul 27 '23 11:07 felicienfrancois

I'm in the same situation as you @felicienfrancois. After that, I don't think adding them to the body changes anything to the general need to only retrieve fields (other than the "file" type) in JSON format a more "convenient way".

I think a "helper" function would be a good thing, allowing us to do some variable destructuring. E.g.:

const { fields, filename, mimetype } = await request.file(); // Actual behavior
const { myNonFileField, myOtherField } = fields.getFields(); // would be nice (but may return files)

// const { myNonFileField, myOtherField } = fields.getFields({ includeFiles: false }); // would be even nicer

Since actual fields (from await request.file();) value looks like:

fields: <ref *2> {
    file: [Circular *1],
    myExampleField: {
      type: 'field',
      fieldname: 'myExampleField',
      mimetype: 'text/plain',
      encoding: '7bit',
      value: 'myValue',
      fieldnameTruncated: false,
      valueTruncated: false,
      fields: [Circular *2]
    }
  }

The getFields() helper could just filter the fields by their mimetype or even if the key is different from file.

I don't know what @mcollina and @Eomm think about that.

Nov 16 '23 11:11 maximelafarie