node-unzipper S3 Stream Get, Unzip, S3 Stream Pull

We have a zip file in S3 bucket, the file include 1 mil small size (2k) qr code svg files. It run on aws lambda, we try to stream-to-stream with 3000M memory size. However it can get the file but it CANNOT upload to S3 file by file. What can we do?

async function readStreamFmS3(bicketName, key){ const params = { Bucket: bucketName, Key: key }; console.log("params=="+ JSON.stringify(params));

const zip = s3.getObject(params).createReadStream().pipe(unzipper.Parse({forceStream: true})); var upload = [];

for await (const entry of zip) { const fileName = entry.path; const type = entry.type; // 'Directory' or 'File' const size = entry.vars.uncompressedSize; // There is also compressedSize;

if (type === "File") {
  var fileFullPath = 'uploadFolder/'+ fileName;
  console.log("File stream==" + fileFullPath);

/* var upload = await s3Stream.upload({ Bucket: bucketName, Key: fileFullPath, ACL: 'public-read' }); */ upload[index] = S3S.WriteStream(s3, { Bucket: bucketName, Key: fileFullPath, ACL: 'public-read' // takes same params as s3.createMultipartUpload });

  await entry.pipe(upload[index])
  .on('error', function (error) {
    console.log("err==" + error);
  })
  .on('uploaded', function (details) {
    console.log("end==" + JSON.stringify(details));
    upload[index] = '';
  });
  
  index++;
} else {
  entry.autodrain();
}

};

return "Done";
}

Apr 09 '20 12:04 fyarepo

For S3 upload stream, we tried "s3-stream" or "s3-streaming-upload". However it cannot upload to S3. What can we do?

Apr 09 '20 12:04 fyarepo

Hey, got a similar problem where the last file in a zip was not uploaded, when I wait in the async iterator loop for the upload to finish, like this:

for await (const e of zip) {
  ...
  await s3.upload({...}).promise(); // No idea why this isn't working reliably
  entry.autodrain();
}

However it works when I collect all upload promises outside of the iterator loop and wait for all of them to finish after the iterator is done (complete example):

const unzipper = require('unzipper');
const { S3 } = require('aws-sdk');

const bucketName = 'bucket-name';

async function main() {
  const s3 = new S3();
  const params = {
    Key: 'some-file.zip',
    Bucket: bucketName,
  };

  const zip = s3
    .getObject(params)
    .createReadStream()
    .pipe(unzipper.Parse({ forceStream: true }));

  const promises = [];

  for await (const e of zip) {
    const entry = e;

    const fileName = entry.path;
    const type = entry.type;
    if (type === 'File') {
      const uploadParams = {
        Bucket: bucketName,
        Key: fileName,
        Body: entry,
      };

      promises.push(s3.upload(uploadParams).promise());
    } else {
      entry.autodrain();
    }
  }

  await Promise.all(promises);
}

main();

Jul 07 '20 16:07 ofhouse

I've been trying pretty much all possible options. @ofhouse - tried your example and also had utilized other methods such as Open.s3. The unzipper works for most files, however, the scenario for me is that some smaller files get left behind because of the error below. I've received errors like: error: Error: unexpected end of file when using unzipper.Parse({forceStream: true}) or `"invalid stored block lengths" with open.S3. I'm utilizing s3.upload with partSize and queueSize options. Nothing has been working and definitely would like some assistance.

Sep 23 '21 20:09 pgbce

I'm having a similar problem. I have zip files uploaded to S3 that contain thousands of small (50-500k) .txt files. I can unzip a couple thousand successfully, but zips that contain more than about 10,000 files timeout - even with max Lambda timeout of 15 mins.

What I really want to do is split the zip entries into chunks of 100 and use Promise.all() to send the chunks. That will be faster than putting the files into the S3 bucket one at a time.

But I can't get it to work. If I knew how many files were in the zip archive I could use something like

if (index % 100 === 0) { //send chuck of 100 }

But I can't seem to even get a count of all the zip entries.

This is what I have right now, with each entry being saved to S3 one at a time, then eventually timing out after 15 mins.

`const unPackZip = async (s3ZipFile, destinatonBucket) => {

let readableStream = s3ZipFile.Body;
let zip = readableStream.pipe(unzipper.Parse({ forceStream: true }));
if (zip.readable == true) {

  // Looping over ALL files in the ZIP to extract them one by one 
  // and upload them to another destinatonBucket

  for await (const e of zip) {
    
    const entry = e;
    let fileName = entry.path;
    if (fileName.indexOf("/") > -1) {
      fileName = fileName.split("/").pop();
    }
    const type = entry.type;
    if (type === 'File') {
        
        let readableStream = entry;

        try{
          // Here we use the new AWS SDK for Javascript V3 PutObjectCommand

          let txtOfFile = await streamToString(readableStream);
          const input = {
              ACL: "private",
              Body:  txtOfFile,
              Bucket: destinatonBucket,
              ContentType: "text/plain",
              Key: fileName
          };
          const putCommand = new PutObjectCommand(input);
          await s3Client.send(putCommand);
          

        } catch(e) {
          console.error(e);
          continue;
        }
        

    } else {
      entry.autodrain();
    }
  }
}

}

async function streamToString(stream) { const chunks = []; for await (const chunk of stream) { chunks.push(Buffer.from(chunk)); } return Buffer.concat(chunks).toString("utf-8"); }`

May 29 '24 21:05 marc-reed

node-unzipper node-unzipper copied to clipboard

S3 Stream Get, Unzip, S3 Stream Pull

node-unzipper
node-unzipper copied to clipboard