node-oracledb icon indicating copy to clipboard operation
node-oracledb copied to clipboard

getConnection() method hang after few hours

Open Hash17677 opened this issue 2 years ago • 19 comments

I have created the express server application which connect to oracle database in order to perform DB operations. That application works fine in dev environment and then I published it in ubuntu VM using pm2. Initially it works fine but after few hours getConnection('hr') method hang the request.

Inside application, First I create the oracle db connection pool at the startup of the express app. Here is the code for connection pool.

const initialize = async () => {
    const pool = await oracledb.createPool({
        user: 'username',
        password: 'password',
        connectString: 'x.x.x.x:1521/sampledev',
        poolMin: 5,
        poolMax: 10,
        poolIncrement: 1,
        poolAlias: "hr",
    });

    console.log("Connected to the oracle db pool");
}

Here is the endpoint that use oracle connection.

app.get('/dbstatus', async (req, res) => {
    try {
        oracleConnApps = await oracledb.getConnection('hr');
        console.log("DB Connected")

        // do stuff

        res.status(200).send({ status: true, data: "OK" });
    } catch (err) {
        console.log("CATCH EXECUTE")
        console.log({ status: true, data: err.message })
        res.status(400).send({ status: false, data: err.message });
    } finally {
        console.log("FINALLY EXECUTE")
        if (oracleConnApps) await oracleConnApps.close();
    }
})

I used oracledb: 6.2.0 version and thin mode to connect DB.

Initially it works fine. After few hours getConnection method hang the server and it does not trigger the catch block. If I restart node app using pm2 then it works. But same error happen after few hours.

Hash17677 avatar Nov 21 '23 09:11 Hash17677

Thank you for using node-oracledb.

Can you try enabling Thick mode and see if the problem is reproduced.

sharadraju avatar Nov 21 '23 12:11 sharadraju

I try the Thick mode as well. But it does not fix the error. @sharadraju

Hash17677 avatar Nov 21 '23 14:11 Hash17677

A "hang" like you are reporting suggests that connections are not being returned to the pool. You can examine the pool statistics to see whether that is indeed the case. documentation.

anthony-tuininga avatar Nov 21 '23 16:11 anthony-tuininga

@Hash17677 as I mentioned in your original stackoverflow post, as well as checking pool stats, also see Always Use Connection Pools — and How.

cjbj avatar Nov 21 '23 23:11 cjbj

I monitor the statistics of the pool and notice connectionsOpen decrease the amount gradually. As I mentioned above I set poolMin to 5. But I notice connectionsOpen decrease the value under the poolmin. But It never increase.

@cjbj @anthony-tuininga

Hash17677 avatar Nov 23 '23 06:11 Hash17677

@Hash17677 Is there any sign of queuing?

cjbj avatar Nov 23 '23 07:11 cjbj

No @cjbj I have not notice any queuing in the stats.

Hash17677 avatar Nov 23 '23 09:11 Hash17677

@Hash17677 If you would like to share some stats and/or a reproducible testcase then I might be able to help you. Maybe it isn't connection exhaustion but is some DB lock in your app - you'll need to do some debugging.

If the code snippet you shared is actually what you are using, check your scope for oracleConnApps to make sure the close() is being called.

cjbj avatar Nov 24 '23 02:11 cjbj

I have same problem. Call ping inside getConnection hangs 11-16 minutes and catches ETIMEDOUT inside net stream.

stream error:
Error: read ETIMEDOUT
    at TCP.onStreamRead (node:internal/stream_base_commons:217:20) {
  errno: -110,
  code: 'ETIMEDOUT',
  syscall: 'read'
}

NJS-500: connection to the Oracle Database was broken
NJS-501: connection to host 172.20.188.174 port 1521 terminated unexpectedly. (CONNECTION_ID=UpVDSqbkhcGXyrjvZjKU5A==)

read ETIMEDOUT
    at NTTCP.checkErr (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/ntTcp.js:327:29)
    at NTTCP.receive (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/ntTcp.js:465:12)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async NetworkSession._recvPacket (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/networkSession.js:378:22)
    at async NetworkSession.recvPacket (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/networkSession.js:443:12)
    at async ReadPacket.waitForPackets (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/protocol/packet.js:293:20)
    at async Protocol._decodeMessage (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/protocol/protocol.js:70:5)
    at async Protocol._processMessage (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/protocol/protocol.js:153:9)
    at async ThinConnectionImpl.ping (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/connection.js:977:5)
    at async ThinPoolImpl.acquire (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/pool.js:505:11)

TCP/IP connections may be broken by router/switch (restart). Why ping not working periodically (pingInterval) for free connections and not clean failure connections? In log i see errors only after call getConnection. Nothing for periodically pings? Why not working keepAlive for tcp connection?

Statistics before call getConnection:

Driver:
...thin mode: true
Pool statistics:
...gathered at: 2023-12-21T07:04:55.978Z
...up time (milliseconds): 73507015
...up time from last reset (milliseconds) 73506988
...connection requests: 10
...requests enqueued: 0
...requests dequeued: 0
...requests failed: 0
...requests exceeding queueMax: 0
...requests exceeding queueTimeout: 0
...current queue length: 0
...maximum queue length: 0
...sum of time in queue (milliseconds): 0
...minimum time in queue (milliseconds): 0
...maximum time in queue (milliseconds): 0
...average time in queue (milliseconds): 0
...pool connections in use: 0
...pool connections open: 2
Pool attributes:
...connectString: BANK
...edition: 
...events: false
...externalAuth: false
...homogeneous: true
...poolAlias: WORKER-2
...poolIncrement: 1
...poolMax: 5
...poolMaxPerShard: undefined
...poolMin: 2
...poolPingInterval (seconds): 30
...poolTimeout (seconds): 60
...queueMax: 500
...queueTimeout (milliseconds): 60000
...sessionCallback: undefined
...sodaMetaDataCache: undefined
...stmtCacheSize: 30
...user: USER
Related environment variables:
...UV_THREADPOOL_SIZE: 10

In project i use pkg and compile it with node16-linux-x64 target (node.js v16).

boriborm avatar Dec 21 '23 08:12 boriborm

Same problem here, using version 6.0.0 and docker container running on custom VLAN. We are catching ORA-03113 on Sentry on previous requests. Once it hangs, the node app doesn't respond anymore. I'll be updating on this topic once we run some stress test, trying to bring down the network connection with different methods.

NJS-500: connection to the Oracle Database was broken DPI-1080: connection was closed by ORA-3113

ptavasci-provart avatar Dec 21 '23 16:12 ptavasci-provart

Maybe I found solution.

Article https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die Part "Busy ESTAB socket is not forever" This is my situation. Keep alive working, but on Linux default value of TCP_KEEPALIVE_TIME = 7300 (>120 minutes). When socket broken and ping calls earlier then first keep alive after break, socket hangs ~16 minutes.

In article used parameter TCP_USER_TIMEOUT. This is analog of net.socket.setTimeout()?

What about set timeout of connection stream manually from connection options, or set timeout as TCP_USER_TIMEOUT = TCP_KEEPIDLE + TCP_KEEPINTVL * TCP_KEEPCNT where TCP_KEEPINTVL * TCP_KEEPCNT = 1 * 10

P.S. I set connection option expireTime to 1 minute and then in log i see droping of broken connections before call getConnection. But if ping calls within period before keep alive (1 minute), ping packet sends 16 minutes and then timeout (after send packet to socket, keep alive not working). Needs use setTimeout

boriborm avatar Dec 21 '23 19:12 boriborm

@boriborm setTimeout may timeout idle but valid connections. I think the correct solution would be a combination of expireTime and callTimeout (https://node-oracledb.readthedocs.io/en/latest/api_manual/connection.html#connection.callTimeout)

expireTime would timeout idle connections in the pool before it is acquired. If the connection is acquired before the keep alive idle timeout, then callTimeout would kick in and timeout the connection.

sreguna avatar Dec 22 '23 04:12 sreguna

@sreguna callTimeout is not variant. It sets timeout to send breaking packet. Sending breaking packet is the same as sending ping packet. Hangs ~16 minutes on TCP/IP level.

 async _processMessage(message) {
    let callTimer;
    let callTimeoutExpired = false;
    try {
      if (this.callTimeout > 0) {
        callTimer = setTimeout(() => {
          callTimeoutExpired = true;
          this.breakMessage();          <-----
        }, this.callTimeout);
      }
      await this._encodeMessage(message);
...
  breakMessage() {
    this._breakInProgress = true;
    this.nsi.sendBreak();         <----
  }
sendBreak() {
    if (this.isBreak)
      return; /* Already in a break */

    if (!this.connected) {
      this.isBreak = true; /* Not yet connected. Post the break */
      this.breakPosted = true;
      return;
    }

    this.isBreak = true;
    this.markerPkt.prepare(constants.NSPMKTD1, constants.NIQIMARK);
    this._sendPacket(this.markerPkt.buf);                 <----
  }

I think simple solution is use timeout only for ping. If ping timed out, break streaming of ping packet, force closing of stream and invalidate connection.

boriborm avatar Dec 23 '23 10:12 boriborm

@sreguna callTimeout is not variant. It sets timeout to send breaking packet. Sending breaking packet is the same as sending ping packet. Hangs ~16 minutes on TCP/IP level.

 async _processMessage(message) {
    let callTimer;
    let callTimeoutExpired = false;
    try {
      if (this.callTimeout > 0) {
        callTimer = setTimeout(() => {
          callTimeoutExpired = true;
          this.breakMessage();          <-----
        }, this.callTimeout);
      }
      await this._encodeMessage(message);
...
  breakMessage() {
    this._breakInProgress = true;
    this.nsi.sendBreak();         <----
  }
sendBreak() {
    if (this.isBreak)
      return; /* Already in a break */

    if (!this.connected) {
      this.isBreak = true; /* Not yet connected. Post the break */
      this.breakPosted = true;
      return;
    }

    this.isBreak = true;
    this.markerPkt.prepare(constants.NSPMKTD1, constants.NIQIMARK);
    this._sendPacket(this.markerPkt.buf);                 <----
  }

I think simple solution is use timeout only for ping. If ping timed out, break streaming of ping packet, force closing of stream and invalidate connection.

The hang is consistent? because I've been having this issue and when it happens it maybe 16 minutes or hangs forever

jd-apprentice avatar Dec 28 '23 17:12 jd-apprentice

I think after setting expireTime (i.e net socket setKeepAlive ) , indefinite hang is not seen .

sudarshan12s avatar Jan 04 '24 12:01 sudarshan12s

I have same problem. Call ping inside getConnection hangs 11-16 minutes and catches ETIMEDOUT inside net stream.

stream error:
Error: read ETIMEDOUT
    at TCP.onStreamRead (node:internal/stream_base_commons:217:20) {
  errno: -110,
  code: 'ETIMEDOUT',
  syscall: 'read'
}

NJS-500: connection to the Oracle Database was broken
NJS-501: connection to host 172.20.188.174 port 1521 terminated unexpectedly. (CONNECTION_ID=UpVDSqbkhcGXyrjvZjKU5A==)

read ETIMEDOUT
    at NTTCP.checkErr (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/ntTcp.js:327:29)
    at NTTCP.receive (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/ntTcp.js:465:12)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async NetworkSession._recvPacket (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/networkSession.js:378:22)
    at async NetworkSession.recvPacket (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/sqlnet/networkSession.js:443:12)
    at async ReadPacket.waitForPackets (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/protocol/packet.js:293:20)
    at async Protocol._decodeMessage (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/protocol/protocol.js:70:5)
    at async Protocol._processMessage (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/protocol/protocol.js:153:9)
    at async ThinConnectionImpl.ping (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/connection.js:977:5)
    at async ThinPoolImpl.acquire (/snapshot/nodeoraserver/node_modules/oracledb/lib/thin/pool.js:505:11)

TCP/IP connections may be broken by router/switch (restart). Why ping not working periodically (pingInterval) for free connections and not clean failure connections? In log i see errors only after call getConnection. Nothing for periodically pings? Why not working keepAlive for tcp connection?

Statistics before call getConnection:

Driver:
...thin mode: true
Pool statistics:
...gathered at: 2023-12-21T07:04:55.978Z
...up time (milliseconds): 73507015
...up time from last reset (milliseconds) 73506988
...connection requests: 10
...requests enqueued: 0
...requests dequeued: 0
...requests failed: 0
...requests exceeding queueMax: 0
...requests exceeding queueTimeout: 0
...current queue length: 0
...maximum queue length: 0
...sum of time in queue (milliseconds): 0
...minimum time in queue (milliseconds): 0
...maximum time in queue (milliseconds): 0
...average time in queue (milliseconds): 0
...pool connections in use: 0
...pool connections open: 2
Pool attributes:
...connectString: BANK
...edition: 
...events: false
...externalAuth: false
...homogeneous: true
...poolAlias: WORKER-2
...poolIncrement: 1
...poolMax: 5
...poolMaxPerShard: undefined
...poolMin: 2
...poolPingInterval (seconds): 30
...poolTimeout (seconds): 60
...queueMax: 500
...queueTimeout (milliseconds): 60000
...sessionCallback: undefined
...sodaMetaDataCache: undefined
...stmtCacheSize: 30
...user: USER
Related environment variables:
...UV_THREADPOOL_SIZE: 10

In project i use pkg and compile it with node16-linux-x64 target (node.js v16).

just to clarify, currently periodic cleanup of free connections only removes the connections that are idle more thanpoolTimeout configured value. poolPingInterval is checked during getConnection as it involves round trip. expireTime should enable keepAlive.

sudarshan12s avatar Jan 04 '24 14:01 sudarshan12s

@boriborm, Thanks for suggestion of pingTimeout in comment. I am checking the feasibility of the same.

sudarshan12s avatar Jan 08 '24 18:01 sudarshan12s

Same problem here, using version 6.0.0 and docker container running on custom VLAN. We are catching ORA-03113 on Sentry on previous requests. Once it hangs, the node app doesn't respond anymore. I'll be updating on this topic once we run some stress test, trying to bring down the network connection with different methods.

NJS-500: connection to the Oracle Database was broken DPI-1080: connection was closed by ORA-3113

Thick mode has SQLNET.RECV_TIMEOUT to limit the time for executes . details here

sudarshan12s avatar Jan 10 '24 02:01 sudarshan12s

We have added poolPingTimeout parameter in the patch here. This will be included in the node-oracledb 6.4 release.

sharadraju avatar Jan 30 '24 11:01 sharadraju

This has fixed in node-oracledb 6.4.

sharadraju avatar Mar 11 '24 17:03 sharadraju