nodejs-storage icon indicating copy to clipboard operation
nodejs-storage copied to clipboard

Memory leak in createReadStream when destroy() is called on resulting stream and validation is set to true

Open jpambrun opened this issue 2 years ago • 3 comments

Memory leak in createReadStream when destroy() is called on resulting stream and validation is set to true. This could occur when 1) the required data could be before the end of the file and reading to completion is not required, or 2) when a downstream client is disconnected or has navigated away.

Environment details

  • OS: Linux
  • Node.js version: v18.6.0
  • npm version: 8.14.0
  • @google-cloud/storage version: 6.2.3

Steps to reproduce

  1. Start GCS simulator using provided docker compose. (this also occurs in real GCS, simulator is just for convenience)
  2. Run supplied script with validation: true
  3. Notice increasing memory usage, up to 700mb+
  4. Re-run supplied script with validation: false
  5. Notice constant low memory usage.

docker compose setup

version: "3.9"  # optional since v1.27.0
services:
  google-cloud-storage-http2:
    image: caddy:2.4.5
    ports:
      - "4443:443"
    entrypoint: ["sh", "-c", 'echo "$$CADDY_CONFIG" > Caddyfile && cat Caddyfile && caddy run']
    environment:
      CADDY_CONFIG: >
          {
            auto_https disable_redirects
            local_certs
          }
          localhost {
            log
            reverse_proxy {
                    to http://google-cloud-storage:80
                    header_down Location (http://\[::\]:80)(.*) https://localhost:4443$$2
            }
          }
  google-cloud-storage:
    image: fsouza/fake-gcs-server:latest
    entrypoint: ["sh", "-c", "mkdir -p /data && mkdir -p /preload/vision-poc && touch /preload/vision-poc/.keep && /bin/fake-gcs-server -scheme http -port 80 -data /preload -filesystem-root /data"]
    ports:
      - "4080:80"

script

import { Storage } from "@google-cloud/storage";
import stream from "stream";
import crypto from "crypto";
import { promisify } from "util";
import pump from "pump";
import process from "process";

const pPump = promisify(pump);
import pLimit from "p-limit";

process.env["NODE_TLS_REJECT_UNAUTHORIZED"] = 0;

const buf = crypto.randomBytes(1024 * 1024 * 2);
const bucketName = "vision-poc";
const fileName = "file.txt";
const storage = new Storage({ apiEndpoint: "https://localhost:4443", projectId: "fake-project-id" });

const myBucket = storage.bucket(bucketName);
const file = myBucket.file(fileName);

async function upload() {
  const passthroughStream = new stream.PassThrough();
  passthroughStream.write(buf);
  passthroughStream.end();
  await pPump(passthroughStream, file.createWriteStream());
}

function download() {
  return new Promise(async (resolve, reject) => {
    const res = await file.createReadStream({ validation: true });
    const chunks = [];
    res.on("data", (chunk) => {
      chunks.push(chunk);
      const bytesSoFar = chunks.reduce((acc, cur) => acc + cur.length, 0);

      if (bytesSoFar > 1024 * 1024 * 1) {
        chunks.length = 0
        res.destroy(); // <--- HERE I don't need more data, or downstream client is gone
        console.log(process.memoryUsage().rss / 1024 / 1024);
        resolve();
      }
    });
    res.on("error", (err) => {
      console.log(err);
      reject(err);
    });
    res.on("end", () => {
      console.log("end"); // not reached
    });
  });
}

// main
(async () => {
  await upload().catch(console.error);

  const limit = pLimit(100);

  const promises = [];
  for (let i = 0; i < 2000; i++) {
    const promise = limit(download);
    promises.push(promise);
  }

  await Promise.all(promises).catch(console.error);

  setInterval(() => {
    console.log(process.memoryUsage().rss / 1024 / 1024);
  }, 1000);
})();

jpambrun avatar Jul 29 '22 17:07 jpambrun

Thank you for reporting this @jpambrun. I will investigate and update the issue accordingly.

ddelgrosso1 avatar Aug 01 '22 19:08 ddelgrosso1

Hi @jpambrun. Thank you for the code above. I have attempted to reproduce this in my local environment utilizing calls to GCS as opposed to the emulator. I wanted to ask how much of a difference you are seeing in your environment with validation = true vs validation = false? I ran a few profiling / tracing runs and the results did not immediately point to any glaring issues in the validation code. I am curious if you saw anything else that might be useful in digging in deeper.

ddelgrosso1 avatar Aug 03 '22 17:08 ddelgrosso1

The area of interest is here: https://github.com/googleapis/nodejs-storage/blob/main/src/file.ts#L1473-L1595

I did spent a bit several hours digging and could not really pinpoint the issue. It may be related to that pipe. Elsewhere the library uses pump whos main goal is to solve "When using standard source.pipe(dest) source will not be destroyed if dest emits close or an error."

This GCS client is extremely complicated.. trying to go from the bare node http response to this returned throughStream involves a lot of passThrough streams, any one of which could be wrongly connected.

All I can really say is that calling destroy() on the returned throughStream (the dest) should bubble up all the way to the http response (the source) so it can be properly closed and resource can be freed.

jpambrun avatar Aug 03 '22 23:08 jpambrun

Hey @jpambrun, we've recently refactored our streams in a recent release (v6.4.1):

  • https://github.com/googleapis/nodejs-storage/compare/v6.4.0...v6.4.1

In short, we've removed pumpify (and the associated .pipe) & started using the native Node.js pipeline. Are you still experiencing leaks in recent releases?

danielbankhead avatar Sep 06 '22 20:09 danielbankhead

Hi @jpambrun ,

Please reopen with the requested information if you're still seeing this issue.

shaffeeullah avatar Sep 30 '22 13:09 shaffeeullah

@jpambrun I know this issue is old and closed for over a year but please see: https://github.com/googleapis/nodejs-storage/issues/2313. I think this may have been the same issue.

ddelgrosso1 avatar Oct 19 '23 14:10 ddelgrosso1