nodejs-storage
nodejs-storage copied to clipboard
Memory leak in createReadStream when destroy() is called on resulting stream and validation is set to true
Memory leak in createReadStream when destroy() is called on resulting stream and validation is set to true. This could occur when 1) the required data could be before the end of the file and reading to completion is not required, or 2) when a downstream client is disconnected or has navigated away.
Environment details
- OS: Linux
- Node.js version: v18.6.0
- npm version: 8.14.0
-
@google-cloud/storage
version: 6.2.3
Steps to reproduce
- Start GCS simulator using provided docker compose. (this also occurs in real GCS, simulator is just for convenience)
- Run supplied script with
validation: true
- Notice increasing memory usage, up to 700mb+
- Re-run supplied script with
validation: false
- Notice constant low memory usage.
docker compose setup
version: "3.9" # optional since v1.27.0
services:
google-cloud-storage-http2:
image: caddy:2.4.5
ports:
- "4443:443"
entrypoint: ["sh", "-c", 'echo "$$CADDY_CONFIG" > Caddyfile && cat Caddyfile && caddy run']
environment:
CADDY_CONFIG: >
{
auto_https disable_redirects
local_certs
}
localhost {
log
reverse_proxy {
to http://google-cloud-storage:80
header_down Location (http://\[::\]:80)(.*) https://localhost:4443$$2
}
}
google-cloud-storage:
image: fsouza/fake-gcs-server:latest
entrypoint: ["sh", "-c", "mkdir -p /data && mkdir -p /preload/vision-poc && touch /preload/vision-poc/.keep && /bin/fake-gcs-server -scheme http -port 80 -data /preload -filesystem-root /data"]
ports:
- "4080:80"
script
import { Storage } from "@google-cloud/storage";
import stream from "stream";
import crypto from "crypto";
import { promisify } from "util";
import pump from "pump";
import process from "process";
const pPump = promisify(pump);
import pLimit from "p-limit";
process.env["NODE_TLS_REJECT_UNAUTHORIZED"] = 0;
const buf = crypto.randomBytes(1024 * 1024 * 2);
const bucketName = "vision-poc";
const fileName = "file.txt";
const storage = new Storage({ apiEndpoint: "https://localhost:4443", projectId: "fake-project-id" });
const myBucket = storage.bucket(bucketName);
const file = myBucket.file(fileName);
async function upload() {
const passthroughStream = new stream.PassThrough();
passthroughStream.write(buf);
passthroughStream.end();
await pPump(passthroughStream, file.createWriteStream());
}
function download() {
return new Promise(async (resolve, reject) => {
const res = await file.createReadStream({ validation: true });
const chunks = [];
res.on("data", (chunk) => {
chunks.push(chunk);
const bytesSoFar = chunks.reduce((acc, cur) => acc + cur.length, 0);
if (bytesSoFar > 1024 * 1024 * 1) {
chunks.length = 0
res.destroy(); // <--- HERE I don't need more data, or downstream client is gone
console.log(process.memoryUsage().rss / 1024 / 1024);
resolve();
}
});
res.on("error", (err) => {
console.log(err);
reject(err);
});
res.on("end", () => {
console.log("end"); // not reached
});
});
}
// main
(async () => {
await upload().catch(console.error);
const limit = pLimit(100);
const promises = [];
for (let i = 0; i < 2000; i++) {
const promise = limit(download);
promises.push(promise);
}
await Promise.all(promises).catch(console.error);
setInterval(() => {
console.log(process.memoryUsage().rss / 1024 / 1024);
}, 1000);
})();
Thank you for reporting this @jpambrun. I will investigate and update the issue accordingly.
Hi @jpambrun. Thank you for the code above. I have attempted to reproduce this in my local environment utilizing calls to GCS as opposed to the emulator. I wanted to ask how much of a difference you are seeing in your environment with validation = true
vs validation = false
? I ran a few profiling / tracing runs and the results did not immediately point to any glaring issues in the validation code. I am curious if you saw anything else that might be useful in digging in deeper.
The area of interest is here: https://github.com/googleapis/nodejs-storage/blob/main/src/file.ts#L1473-L1595
I did spent a bit several hours digging and could not really pinpoint the issue. It may be related to that pipe. Elsewhere the library uses pump whos main goal is to solve "When using standard source.pipe(dest) source will not be destroyed if dest emits close or an error."
This GCS client is extremely complicated.. trying to go from the bare node http response to this returned throughStream
involves a lot of passThrough streams, any one of which could be wrongly connected.
All I can really say is that calling destroy()
on the returned throughStream
(the dest) should bubble up all the way to the http response (the source) so it can be properly closed and resource can be freed.
Hey @jpambrun, we've recently refactored our streams in a recent release (v6.4.1):
- https://github.com/googleapis/nodejs-storage/compare/v6.4.0...v6.4.1
In short, we've removed pumpify
(and the associated .pipe
) & started using the native Node.js pipeline
. Are you still experiencing leaks in recent releases?
Hi @jpambrun ,
Please reopen with the requested information if you're still seeing this issue.
@jpambrun I know this issue is old and closed for over a year but please see: https://github.com/googleapis/nodejs-storage/issues/2313. I think this may have been the same issue.