node-archiver icon indicating copy to clipboard operation
node-archiver copied to clipboard

archive.append(stream) does not call stream.pipe() synchronously

Open aalexgabi opened this issue 3 years ago • 1 comments

I use the same stream to do both a direct upload and a zip upload to avoid downloading the file twice:

// Create source stream
const sourceBucketStream = sourceBucket.file('image.png').createReadStream();

// Create long stream
const longStream = fs.createReadStream('big-file');

archive.append(longStream, { name: 'big-file });
archive.append(sourceBucketStream, { name: 'image.png' });
archive.finalize();

await Promise.all([
  promisifiedPipeline(
    sourceBucketStream,
    destinationBucket.file('image.png').createWriteStream(),
  ),
  promisifiedPipeline(
    archive,
    destinationBucket.file('archive.zip').createWriteStream(),
  )
])

I added some debug statements on events of the source streams and the order of events is the following:

  • sourceBucketStream.pipe() is called by the direct bucket upload
  • data is passed for the direct upload
  • sourceBucketStream ends because all data was uploaded
  • sourceBucketStream.pipe() is called by archiver
  • streams end immediately because it's already consumed
  • file is 0 bytes in the archive

sourceBucketStream.pipe() is called later which means that the backpressure on the stream consumption does not start until later and sometimes the image file is uploaded and the entire stream consumed before archiver starts piping and reading the file.

As a workaround now I download the source file twice instead of piping the same stream twice.

The expected behavior is that archiver.append(stream) calls stream.pipe() synchronously to ensure that no data is lost on the stream in case the stream is used by more consumers.

Here is an example: https://stackoverflow.com/questions/19553837/node-js-piping-the-same-readable-stream-into-multiple-writable-targets

aalexgabi avatar Mar 22 '21 12:03 aalexgabi

As a workaround I do this:

      // Make sure pipe is called immediately to avoid losing data https://github.com/archiverjs/node-archiver/issues/510
      archive.append(pipeline([stream, new PassThrough()], (err) => {
        if (err) {
          // Log error here
        }
      }), { name });

aalexgabi avatar Apr 19 '21 13:04 aalexgabi