node-mysql2 icon indicating copy to clipboard operation
node-mysql2 copied to clipboard

Node process does not end and but CPU utilization at 100%

Open nachogiljaldo opened this issue 2 years ago • 25 comments

I have been investigating a problem that is causing trouble on our test suite and I managed to trace it to mysql2. This script allows reproducing it:

Reproduction instructions
// create database test;
// create table test(id int(10) unsigned not null auto_increment, primary key (`id`));
// insert into test(id) values(1);
const mysql = require('mysql2/promise');
async function main() {
    console.log("===========================>", "foo");
    const connection = await mysql.createConnection({
        host: 'localhost',
        user: 'root',
        database: 'test',
        password: 'rootpass',
        connectionLimit: 1234,
    });
    for (let ix = 0; ix < 157; ix++) {
        await basicHanging3(connection);
    }
    connection.destroy();
    console.log('=========================>', 'Done');
}


async function basicHanging3(connection) {
    let query = 'SELECT ';
    let limit = 1000;
    for (let i = 0; i < limit; i++) {
      query += `t.id AS t_id_${i},`;
    }
    query += `t.id AS t_id_${limit}`;
    query += ' FROM test as t';
    const [rows] = await connection.query(
      query,
    );
    return rows;
}

main();
{
  "name": "reproducer",
  "version": "1.0.0",
  "main": "index.js",
  "license": "MIT",
  "dependencies": {
    "mysql2": "^3.4.2",
    "yarn": "^1.22.19"
  }
}

It happens with multiple node versions (v16.14.0, v18.16.0 are some of those I tested). For node 20.3.1 it happens, but you need to raise the 157 iterations.

In our tests we run thousands of DB queries which seems to be reaching this threshold and causing the process to "hung" (it eventually exits, but it takes a few minutes). As you can imagine this is a huge pain on CI because the "exit" takes as much as the test suite itself.

2 other interesting things to note:

  • the CPU for the node process stays at 100% even though nothing is supposedly being done (we checked to see if it is gc using --trace-gc but no traces are output after "Done")
  • if no data is returned (you can delete the insert statement on index.js) it seems to be ok (at least I raised the number of iterations to 10k and it finished just fine)
  • if you have less columns then you require more iterations to achieve the effect

Netstat shows the DB connection is closed while the process is waiting:

$ sudo netstat -alpW | grep -i <my_pid> 

shows no output.

nachogiljaldo avatar Jun 27 '23 10:06 nachogiljaldo

Sorry, I just noticed https://github.com/sidorares/node-mysql2/issues/2080 and I believe they could be related, my reproduction seems to work correctly with 2.3.0 (which is the one that @rogerho1224 was using and worked for them).

nachogiljaldo avatar Jun 27 '23 10:06 nachogiljaldo

thanks @nachogiljaldo I can reproduce the issue

sidorares avatar Jun 27 '23 13:06 sidorares

No worries @sidorares if it helps, it might be intimately related to #2080 since I can confirm that using 2.3.0 fixed the issue for me.

nachogiljaldo avatar Jun 27 '23 13:06 nachogiljaldo

What is strange is that process hangs with 100% cpu just before exit. I suspect some combination of GC and large number of objects with large number of fields. Profiler does not pick that and shows a very small number of ticks in the main code and in the GC, the ticks during the "frozen" part are in "unaccounted" section

The difference with 2.3.0 is highlighted in #2055, I'll try to see if changing that part only helps.

sidorares avatar Jun 27 '23 14:06 sidorares

@nachogiljaldo what is your os / architecture? Not sure if that matters, I'm seeing the issue on m1 mac

sidorares avatar Jun 27 '23 14:06 sidorares

I'm not sure if it's GC related, (not a node internals expert,but) I ran my reproduction with --trace-gc and the gc logs stop right with the "Done" message.

nachogiljaldo avatar Jun 27 '23 14:06 nachogiljaldo

Mine is Ubuntu x86_64. But a colleague of mine has the same issue with similar effects with M1.

nachogiljaldo avatar Jun 27 '23 14:06 nachogiljaldo

typically GC related performance problems are visible in the profiler log, maybe we see a bug in GC The script works fine even with large number of repeats and only shows problem just before the exit

sidorares avatar Jun 27 '23 14:06 sidorares

@nachogiljaldo I reduced script to a version with no external dependencies, could you try to run first file from https://gist.github.com/sidorares/128160e6b3dea1da3ad45cd672651d2d ? Still not sure what's going on, but switching to setting fields in the constructor hides the issue ( repro2.js )

cc @testn

sidorares avatar Jun 27 '23 22:06 sidorares

Actually it does finish, eventually. First script from gist finishes in about 5 minutes time I think we are hitting some V8 edge case where performance is not great ( o(n^2) for the number of fields? )

sidorares avatar Jun 28 '23 05:06 sidorares

I think I'll try to file this to the core node

sidorares avatar Jun 28 '23 05:06 sidorares

https://github.com/nodejs/node/issues/48581

sidorares avatar Jun 28 '23 07:06 sidorares

~~since there is reliable repro without dependency on mysql2 I'll try to "fix" the issue by converting parser to add fields in the constructor~~

Actually, I think the better solution would be to cap the number of fields in dynamically compiled parser, if the number is over the limit use "interpreting" parser instead ( e.g in each row iterate over every column and read the code based on the column type ). We used to have "interpreting" parser a while ago - https://github.com/sidorares/node-mysql2/blob/04f07f82714587049eb7cf1c92f983da34de87c5/lib/commands/query.js#L59-L69

sidorares avatar Jun 28 '23 07:06 sidorares

Not sure if this related. We reproduced the same kind of problem few time ago when we migrated from another mysql driver. We noticed an important increase of the cpu usage on the nodejs server and more slow queries. What was strange is when scaling horizontally or vertically the server, not the db, the slow queries decreased.

We thought that the problem came from our side since few issues were open on the repository and one of the purpose of this lib was a performance gain. But we finally reverted the change of the driver and the problem is gone.

Indeed maybe it's when we manage many objects with many relations.

We use RDS mysql with ECS on fargate, v16.20 node version, [email protected] If necessary i can give other information.

sebdec avatar Jun 28 '23 15:06 sebdec

@sebdec do you know roughly the number of columns and rows you are reading in an average query?

sidorares avatar Jun 28 '23 23:06 sidorares

@sebdec @nachogiljaldo @rogerho1224 could you try with --noopt or --max-opt=2 flags passed to node

sidorares avatar Jun 29 '23 04:06 sidorares

Just tested with --noopt and it finished successfully almost immediately.

nachogiljaldo avatar Jun 29 '23 06:06 nachogiljaldo

Also @sidorares I tested the first file as you suggested (repro1) and it reproduces.

nachogiljaldo avatar Jun 29 '23 06:06 nachogiljaldo

we are currently using this module and our session inits into apps spent a lot of cpu, sometimes >1 core, but after auth is everything fine.

We already done cpu profiling and seeing that most of ticks it's /usr/local/bin/node and mysql2 library

How to deal with that broken but important module? Any hotfix? Any patches? Any workarounds?

stefkkkk avatar Dec 14 '23 20:12 stefkkkk

@stefkkkk some possible options: 1) --noopt or --max-opt=2 flags 2) check if you have responses with more rows than you actually need ( SELECT * ) and reduce number of fields to what is actually used

What is your node version? The root cause is in node optimiser so hopefully its fixed eventually on a node side

sidorares avatar Dec 15 '23 04:12 sidorares

2023-12-15_10-53 The fix for me was to upgrade node js version from 20.10 to 21.4, and inadequate cpu usage gone Probably, it's due to updates of the V8 JavaScript engine to 11.8 in 21 version of node js @sidorares thanks for help

stefkkkk avatar Dec 15 '23 09:12 stefkkkk

Mmm... in my environment (node20, linux) this was fixed on 3.9.4 (i.e. I was able to upgrade from the previous version I was running 2.7.0).

I tried to upgrade to 3.9.7 today and the problem started to happen again, traced it to 3.9.5 so there's something in that PR that broke the issue that had been inadvertently fixed?

EDIT: tried the noopt and max-opts alternatives but that's not viable as it makes our build absurdly slow (and some tests even time out).

nachogiljaldo avatar Apr 24 '24 09:04 nachogiljaldo

@nachogiljaldo I'm pretty sure inheriting result from null fixes/hides the problem, unfortunately this comes with it's own issues ( #2585 )

I'm still keen to review the way resulting object is constructed - see https://github.com/sidorares/node-mysql2/issues/2055

sidorares avatar Apr 24 '24 13:04 sidorares

Also a possibility could be a configurable / pluggable result construction, e.i some users might prefer {} as base, some Object.create(null) and some new Map()

sidorares avatar Apr 24 '24 13:04 sidorares

Also a possibility could be a configurable / pluggable result construction, e.i some users might prefer {} as base, some Object.create(null) and some new Map()

That seems potentially as a good idea, but not sure how feasible it would be in practice for cases when one doesn't connect directly to mysql but rather use an ORM.

nachogiljaldo avatar Apr 24 '24 14:04 nachogiljaldo