node-unzipper file.stream(...).autodrain is not a function

I am trying to process a zip from s3. I need to ignore some files. At the moment I am not calling autodrain function and skipping few files. It is giving me some unexpected result while processing. Basically while reading content it is giving content of some other file which I ignore. I suspect it is because of me not calling autodrain.

As pe the documentation I should call autodrain so stream doesn't halt, so I tried and now this is giving me error (file.stream(...).autodrain is not a function). This is my code

const aws = require("aws-sdk");
const s3Client = new aws.S3();
const unzipper = require("unzipper");

const directory = await unzipper.Open.s3(s3Client, {
  Bucket: "somebucket",
  Key: "somezip.zip"
});
const files = directory.files;
for (const file of files) {
  if (file.path.includes("some/path")) {
    // do something with the stream
  } else {
    file.stream().autodrain(); // this gives autodrain is not a function
  }
}

If i look at the types and as per intellisense, stream() method returns an Entry object which has autodrain() function but not sure why it says autodrain is not a function. See screenshot here

Any help is really appreciated.

Mar 15 '20 16:03 montumodi

The Open methods are random access and there is no reason to drain, i.e if you ignore an entry then there is no harm. Whenever you call .stream or .buffer you start reading the zip file at the precise location of the entry.

The concept of autodrain comes from the legacy Parse method that basically reads through the entire zip file from start to finish and emits entries along the way. Each entry has to be read for the reader to be able to continue to the next entry (or end of the file). If you want to skip an entry, you have to call autodrain. Again, with the Open methods, autodrain is not applicable or needed.

Perhaps we should make autodrain a NOOP here to ensure the entry definitions match between Open and Parse

Mar 15 '20 21:03 ZJONSSON

Thanks @ZJONSSON for quick reply. This makes sense. But I am not sure why the behavior is like this. My zip has structure like this

somefile.txt
data
- file1.csv
- file2.csv
- ....
- file86.csv // this is the file from where the problem occurs
- ....
- file450 csv
resources
- header.csv

I have check in my code to only read files from data folder. It works fine till csv no 86 but after that it picks the content of somefile.txt.

this is my full code

  const directory = await unzipper.Open.s3(s3Client, {Bucket: 'ocal', Key: 'CSV.zip'});
  const files = directory.files;
  for(const file of files) {
    if (file.path.includes("data/file")) {
      console.time(file.path);
      const collection = mongo.db().collection(recordIdentifiersMapping[file.path]);
      await file.stream()
        .pipe(etl.csv({"skipLines": 1, "headers": getHeaders(file.path)}))
        .pipe(etl.collect(1000))
        .pipe(etl.map(res => {
          res.forEach(item => {
            item._id = uuid();
          });
          return res;
        }))
        .pipe(etl.mongo.upsert(collection, ["_id"]))
        .promise()
        .then(() => {
          console.log("finished", file.path);
          console.timeEnd(file.path);
        });
      }
  }

Any idea what I might be doing wrong?

Mar 15 '20 21:03 montumodi

Just to give an update, instead of opening the zip from s3 using unzipper.Open.s3, this time I downloaded and extracted the zip and then used my code (basically fs.createReadStream()), it seems to be working fine till now. Not sure if this will help in.

Mar 15 '20 22:03 montumodi

Hi @ZJONSSON - Did you get a chance to look in to my updated comments and behavior I am facing?. It is now consistent. If I try to download first 60-70 files, it works fine, After that for all the files it returns the first file as I mentioned in my previous comments. Do you have suggestion here or something I might be doing wrong?

Mar 17 '20 20:03 montumodi

Thanks @ZJONSSON for quick reply. This makes sense. But I am not sure why the behavior is like this. My zip has structure like this

somefile.txt

data

file1.csv

file2.csv

....

file86.csv // this is the file from where the problem occurs

....

file450 csv

resources

header.csv

I have check in my code to only read files from data folder. It works fine till csv no 86 but after that it picks the content of somefile.txt.

this is my full code
  const directory = await unzipper.Open.s3(s3Client, {Bucket: 'ocal', Key: 'CSV.zip'});
  const files = directory.files;
  for(const file of files) {
    if (file.path.includes("data/file")) {
      console.time(file.path);
      const collection = mongo.db().collection(recordIdentifiersMapping[file.path]);
      await file.stream()
        .pipe(etl.csv({"skipLines": 1, "headers": getHeaders(file.path)}))
        .pipe(etl.collect(1000))
        .pipe(etl.map(res => {
          res.forEach(item => {
            item._id = uuid();
          });
          return res;
        }))
        .pipe(etl.mongo.upsert(collection, ["_id"]))
        .promise()
        .then(() => {
          console.log("finished", file.path);
          console.timeEnd(file.path);
        });
      }
  }
Any idea what I might be doing wrong?

I've got the exact same problem, for some reason, randomly it populates the file data with the root file, which in my case is license.txt; the console logged files are the correct ones but the content is the one from the root file.

for reference:

const directory = await unzipper_1.default.Open.s3(s3, { Bucket: utilities_1.getEnv("BUCKET"), Key: key });
                try {
                    const filesToExtract = directory.files.filter(file => files.includes(file.path));
                    console.log('-----------------', JSON.stringify(filesToExtract));
                    await Promise.all(filesToExtract.map(async (file) => {
                        console.log('---------------extracting-------------', file.path);
                        const fileKey = `extracted/${file.path}`;
                        return await s3.upload({ Bucket: utilities_1.getEnv("BUCKET"), Key: fileKey, Body: file.stream() }, {
                            queueSize: 2
                        }).promise();
                    }));
                }
                catch (err) {
                    console.log(err);
                }

Apr 14 '23 12:04 alez007

node-unzipper node-unzipper copied to clipboard

file.stream(...).autodrain is not a function

node-unzipper
node-unzipper copied to clipboard