duckdb-node icon indicating copy to clipboard operation
duckdb-node copied to clipboard

Memory leak on any query

Open skokenes opened this issue 1 year ago • 7 comments

Running any query in duckdb-node leaks memory. Here is a simple test script that creates a connection and then runs a simple SELECT 42 AS fortytwo SQL statement every 100ms. The script also publishes memory usage to a file every minute. Memory never stops going up.

const duckdb = require("duckdb");
const fs = require("fs");

// Run simple duckdb query on repeat
const db = new duckdb.Database(":memory:");
const con = db.connect();
const stmt = con.prepare("SELECT 42 AS fortytwo");

function test() {
  stmt.all();
}
setInterval(test, 100);

// Capture memory stats over time and write to file
const path = "./data.csv";
fs.writeFileSync(path, "rss,heapTotal,heapUsed\n");
const stream = fs.createWriteStream(path, { flags: "a" });
setInterval(() => {
  const memory = process.memoryUsage();
  const rss = memory.rss / (1024 * 1024);
  const heapTotal = memory.heapTotal / (1024 * 1024);
  const heapUsed = memory.heapUsed / (1024 * 1024);
  console.log(`rss: ${rss} heapTotal: ${heapTotal} heapUsed: ${heapUsed}`);
  stream.write(`${rss},${heapTotal},${heapUsed}\n`);
}, 60000);

Here is a chart of the resulting memory stats, when run for 50 minutes on an MBP M2, node v18.18.2, duckdb 0.10.0 CleanShot 2024-03-04 at 13 59 58@2x

After 2 hours CleanShot 2024-03-04 at 15 22 21@2x

EDIT: updated script to remove recursive call

skokenes avatar Mar 04 '24 17:03 skokenes

I believe we have a fix in #65, thanks a lot for the reproduction / investigation.

carlopi avatar Mar 12 '24 19:03 carlopi

I fear it is not fixed.

memory und timestamp

judgeNotFound avatar Jun 03 '24 14:06 judgeNotFound

@rrcomtech: can you share a reproduction / some more informations?

Any help in tracking problems down is very welcome

carlopi avatar Jun 03 '24 14:06 carlopi

@carlopi Sure, I reduced my code to a minimal example here: Memory Leak Demonstration

It creates 128 workers that create random data and send it to the main thread that puts it into DuckDB. It works fine with SQLite and PostgreSQL, but sadly I see the memory always increasing with DuckDB.

judgeNotFound avatar Jun 04 '24 13:06 judgeNotFound

Also running into memory leak issues myself with a larger parquet dataset with nodeJS @rrcomtech @Mytherin

Its been a few months since this thread was created, did you guys figure out a workaround? I can post details about my issue if you'd like

DrewScatterday avatar Aug 12 '24 21:08 DrewScatterday

No, sadly not. What I did was to instead write to a PostgreSQL and then sync both DBs periodically.

judgeNotFound avatar Aug 12 '24 22:08 judgeNotFound

Ended up solving my issue. I think its a leak with a geoJSON function I'm using in the spatial extension. More details here https://github.com/duckdb/duckdb_spatial/issues/371

DrewScatterday avatar Aug 14 '24 21:08 DrewScatterday