better-sqlite3 icon indicating copy to clipboard operation
better-sqlite3 copied to clipboard

Possible Memory Leak with Inserting Data

Open sadrakhosravi opened this issue 3 years ago • 12 comments

I might be wrong but I believe there is a memory leak when I try to insert about 300 rows every 3 seconds. I have tried it in the main thread and in a worker thread but both show a clear memory leak. I am not sure if this is related to SQLite or the driver.

I am reading a data from a sensor at 100 samples/second and would like to store that in a database. At first I thought it might be the processing of the data that causes the leak, but once I turned off everything and tried to insert some dummy data on an interval, I saw the same issue. My application runs on ElectronJS framework. I have already rebuilt the better-sqlite3 package for my current version of electron.

Here is a sample code that I run in a NodeJS worker thread that shows a clear memory leak.

const { isMainThread, workerData, parentPort } = require('worker_threads');
const path = require('path');
const SQLITE = require('better-sqlite3');
const dbPath = path.join(__dirname, 'mydb.db');

const db = new SQLITE(dbPath, {
  timeout: 1000,
});

db.exec('CREATE TABLE IF NOT EXISTS sensor_data (id INTEGER PRIMARY KEY AUTOINCREMENT, timeStamp INTEGER, PDRawData TEXT, LEDIntensities TEXT, gainValues TEXT, events TEXT, recordingId INTEGER);
');

const pragma1 = db.pragma('journal_mode = WAL');
const pragma2 = db.pragma('synchronous = normal');

const insert = db.prepare(
  'INSERT INTO sensor_data (timeStamp, PDRawData, LEDIntensities, gainValues, events, recordingId ) VALUES (@timeStamp, @PDRawData, @LEDIntensities, @gainValues, @events, @recordingId)'
);

const insertMany = db.transaction((data) => {
  for (const dataPoint of data) {
    insert.run(dataPoint);
  }
});

const sampleData = {
  timeStamp: 0,
  PDRawData: '12313,123123,12313,1243535,3456464,213132,42342',
  LEDIntensities: '123,435,123,4345,123,3435,132,345',
  gainValues: null,
  events: null,
  recordingId: null,
};

const myData = Array(50).fill(sampleData);

setInterval(() => {
  insertMany(myData);
}, 500);

Other possible solutions that I have tried

I tried exposing the global.gc() and running it after every insert of the data. I also tried running this script on the main thread and still got the same result. I have tried recreating my database file many times, but that doesn't help either. I am not sure what I am doing wrong here. I would really appreciate if you could help out with this matter.

Here's my setup: OS: Windows 10, Node v16.9.1 Electron Version: 16.0.8 Better-sqlite3 Version: 7.5.0 (I have also tested previous versions such as 7.4.0)

sadrakhosravi avatar Feb 15 '22 18:02 sadrakhosravi

that shows a clear memory leak. still got the same result.

Can you clarify how you detected said leak? What is the result you get? I don't see any.

const path = require('path');
const SQLITE = require('better-sqlite3');
const dbPath = path.join(__dirname, 'mydb.db');

const db = new SQLITE(dbPath, {
  timeout: 1000,
});

db.exec(`
  CREATE TABLE IF NOT EXISTS
  sensor_data (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timeStamp INTEGER,
    PDRawData TEXT,
    LEDIntensities TEXT,
    gainValues TEXT,
    events TEXT,
    recordingId INTEGER
  )
`);

const pragma1 = db.pragma('journal_mode = WAL');
const pragma2 = db.pragma('synchronous = normal');

const insert = db.prepare(`
  INSERT INTO sensor_data
  (timeStamp,   PDRawData, LEDIntensities, gainValues, events, recordingId ) VALUES
  (@timeStamp, @PDRawData, @LEDIntensities, @gainValues, @events, @recordingId)`
);

const insertMany = db.transaction((data) => {
  for (const dataPoint of data) {
    insert.run(dataPoint);
  }
});

const sampleData = {
  timeStamp: 0,
  PDRawData: '12313,123123,12313,1243535,3456464,213132,42342',
  LEDIntensities: '123,435,123,4345,123,3435,132,345',
  gainValues: null,
  events: null,
  recordingId: null,
};

const myData = Array(50).fill(sampleData);

setInterval(() => {
  insertMany(myData);
  console.log(process.memoryUsage());
}, 50);

You can clearly see the gc and memory dropping back to where it was. And it doesn't change even after running it for minutes. I also cranked up the number (5000 inserts every 10ms) and don't see anything growing.

https://user-images.githubusercontent.com/679144/154233488-90dd2ffc-557f-471a-a79a-496f863757d9.mp4

To be fair this is on Ubuntu, so there is a chance you are seeing different behavior on Windows.

Prinzhorn avatar Feb 16 '22 09:02 Prinzhorn

You can also look into db.pragma('shrink_memory'); and other ways to control the memory SQLite uses. I'm not saying there can't be a memory leak, but there is a difference between leaking memory and using memory.

Prinzhorn avatar Feb 16 '22 09:02 Prinzhorn

Thank you for testing it out. I ran the same exact code on Windows with Electron and saw an increase in the "heapTotal". The increase is very subtle and I had to run it for about 30 mins to see the clear difference. The first 15-20 mins the heapTotal was at 3866624 and later I am getting 4128768.

I am still not sure if this is a memory leak or what would be a clear indication of it. I also monitored my main process RAM in task manager (I know it's not the best tool) and saw an increase from 68MB to 105MB (which happened very slowly - in 30 minutes time).

I tried almost everything to try and find where exactly is the issue. I noticed if I use raw SQL queries instead of prepared statements, I don't see any noticeable memory increase over 30 minutes. My main process started from 68MB and became stable around 73MB. Here is the code that I currently use:

const { isMainThread, workerData, parentPort } = require('worker_threads');
const SQLITE3 = require('better-sqlite3');
const path = require('path');
const db = SQLITE3(path.join(__dirname, 'mydb.db'));

const dbTransaction = (dataArr) => {
  const dataArrLength = dataArr.length;

  try {
    db.exec(`BEGIN TRANSACTION`);
    for (let i = 0; i < dataArrLength; i++) {
      db.exec(
        `INSERT INTO recordings_data(timeStamp,PDRawData, LEDIntensities) VALUES (${dataArr[i].timeStamp}, '${dataArr[i].PDRawData}', '${dataArr[i].LEDIntensities}')`
      );
    }
    db.exec(`COMMIT`);
  } catch (err) {
    db.exec('ROLLBACK');
    console.log('ERROR');
    console.log(err.message);
  }
};

const sampleData = {
  timeStamp: 0,
  PDRawData: '12313,123123,12313,1243535,3456464,213132,42342',
  LEDIntensities: '123,435,123,4345,123,3435,132,345',
  gainValues: null,
  events: null,
  recordingId: null,
};

const myData = Array(50).fill(sampleData);

setInterval(() => {
  dbTransaction(myData);
  console.log(process.memoryUsage());
}, 50);

I also used db.pragma('shrink_memory'); and db.pragma('cache_size = 1'); in my previous testing. The memory increase is still there but at a slightly lower pace.

https://user-images.githubusercontent.com/65143892/154340174-837a578d-be78-4ba3-accf-82a18f28474a.mp4

sadrakhosravi avatar Feb 16 '22 19:02 sadrakhosravi

on Windows with Electron

Is it different from running Node.js directly? That would be interesting to see, since Electron does some funny things. If it repros in Node.js it would be easier to leave Electron out for now.

Maybe you can try looking at heap snapshots to see what type of object are taking that space. So you can compare start vs after 30 minutes. E.g. take a look at https://medium.com/@wavded/how-to-heap-snapshots-aac9284d5329

Prinzhorn avatar Feb 16 '22 19:02 Prinzhorn

Thanks for your reply. I tried using heapdump but unfortunately it does not install on my current version of Electron. I am currently testing it on plain Node.js and will share if I find anything interesting.

sadrakhosravi avatar Feb 16 '22 19:02 sadrakhosravi

Just an update for anyone interested. I tried every implementation that came to my mind, but none of them worked. I started electron with '--inspect' flag to be able to inspect it in Chrome inspector, but couldn't see the worker's memory allocation. I also tried taking heap snapshots but the couldn't find any understandable part of the program that was leaking memory. It was all about node internals and stream.

The solution that worked for me was to spin up a new process (renderer process in electron with node integration) and use IPC to communicate with it. This way the RAM stayed consistently between 17 MB - 17.9 MB.

I am still not sure why this is happening in my app. I guess it has something to do with Electron and how it manages Node.js worker threads.

sadrakhosravi avatar Feb 22 '22 03:02 sadrakhosravi

I am using better-sqlite3 in an Electron worker thread as well, so I'm interested in figuring out what is happening here. Can you provide full steps to reproduce please? I was running the original code you've posted directly using node index.js. If I understand correctly I need to

  1. Use electron main.js and not node
  2. Run the code you've provided in a worker.js worker thread started from main.js

Could you provide a minimal main.js and worker.js that will show increased memory usage when run via electron main.js?

Electron frequently has very low level bugs and worker threads are something that is rarely used even in Node.js, let alone Electron. Electron is also not Node.js, there are many subtle difference, e.g. see https://github.com/electron/electron/issues/26593 What I'm trying to say is that it is entirely plausible that the way Electron combined Chromium and Node.js resulted in a bug that causes this memory leak. Now the question is if it is even related to better-sqlite3 at all. I do have some ideas to track this down, e.g. binding big buffers/strings and creating a new buffer for each INSERT to see if it is related to the bound variables. I want to find a way to crank this up so we don't have to wait 30 minutes to see the increase.

If the exact same code leaks memory using electron main.js (ELECTRON_RUN_AS_NODE?) but not using node main.js, then this is not a better-sqlite3 bug per say. But it could still be related to native modules, which are also rarely used. So we have a fun combination of worker threads and native modules, which both are not very commonly used. It's not even that long that native modules can even be properly run in a worker (context aware).

Prinzhorn avatar Feb 22 '22 12:02 Prinzhorn

I recently realized that native module may be the main cause of memory leaks. I've seen unexplained memory leaks on a Node.js project in the past, and now I think it's most likely caused by better-sqlite3.

It is worth mentioning that the reason why the heap snapshot is small is because the memory of native modules are not recorded.

BlackGlory avatar Jul 06 '22 13:07 BlackGlory

I haven't tried running the garbage collector manually, but anyone reading this issue afterwards needs to read this: https://github.com/WiseLibs/better-sqlite3/issues/433#issuecomment-835926012

BlackGlory avatar Jul 06 '22 14:07 BlackGlory

Based on research, while i cant fully reproduce this (originally thought this was my issue, but found it to be something else, but still same scope - garbage collection) I believe people are getting different results based on different OS's and different memory availability and different node versions due to this relating to GC. Root issue is likely round prepared statements relying on C++ Destructors https://github.com/WiseLibs/better-sqlite3/blob/master/src/objects/statement.lzz#L46

this isn't inheritly bad, but just noting to explain difference, in that you may run into situation where your garbage collector does not run as often due to specifics of your unique environment, and the prepared statements might pile up in RSS.

However, running the supplied code in the original post does not exhibit any issue on 16.20.0, so its possible the GC engine was improved.

It's important to note that GC engines are pretty complex, something I studied quite a lot in java and v8 is using the same engine technique java USE to use, and these engines do vary their behavior based on available memory often.

so incase anyone lands here and is using an outdated old nodejs, you should try on an updated 16+ at least as from what I can see, the GC engine is behaving well and prepared statements are cleaned up correctly there.

But I did personally run into other issues with GC unrelated ot this library, so its possible you may too have issues there. Ultimately, debug the process, collect heap snapshots, do comparisons and identify where is memory growing to narrow down concern areas.

aikar avatar May 19 '23 21:05 aikar

Is this still an issue in 2024?

JoshuaWise avatar Apr 03 '24 04:04 JoshuaWise