node-mssql icon indicating copy to clipboard operation
node-mssql copied to clipboard

Getting Request Timeouts when in reality, queries are run instantly when run directly on the Database

Open Blutude opened this issue 1 year ago • 13 comments

We are getting intermittent database errors in PROD. We are able to reproduce them in different environments and here it the full stack trace.

RequestError: Timeout: Request failed to complete in 15000ms
 at Request.userCallback (/home/node/app/node_modules/@hobt/database-client/node_modules/mssql/lib/tedious/request.js:437:19)
 at Request.callback (/home/node/app/node_modules/tedious/lib/request.js:56:14)
 at Connection.endOfMessageMarkerReceived (/home/node/app/node_modules/tedious/lib/connection.js:2407:20)
 at Connection.dispatchEvent (/home/node/app/node_modules/tedious/lib/connection.js:1279:15)
 at Parser.<anonymous> (/home/node/app/node_modules/tedious/lib/connection.js:1072:14)
 at Parser.emit (events.js:314:20)\n at Parser.<anonymous> (/home/node/app/node_modules/tedious/lib/token/token-stream-parser.js:37:14)
 at Parser.emit (events.js:314:20)
 at addChunk (/home/node/app/node_modules/readable-stream/lib/_stream_readable.js:298:12)
 at readableAddChunk (/home/node/app/node_modules/readable-stream/lib/_stream_readable.js:280:11)

This INSERT request gets retried for 24 hours and does not succeed. We tried running our exact query straight into the Database Query Editor and it gets executed quickly, way under 15 seconds. We analyzed the query performance and it all looks normal, no indication to why it would take over 15 seconds when getting run from the Tedious package. We got support from Microsoft to look into the Database Server and everything seems to look healthy. They could not pinpoint any issue on the Database server that would cause this issue. We are not exhausting any of the connections, the database is not under any stress. Microsoft Support were suggesting that the problems resided on the Tedious package that we are using to connect to MSSQL.

While those database requests were getting retried and failing with those timeouts, we tried restarting the entire service. As soon as the service starts up, it retries those database requests and we get the same failures. So that is not too clear why those issues are also happening during startups, this is leading me to think it might not be an issue with the tedious package (because I would have expected things to work on startup otherwise). So I would really appreciate support in figuring out why we could be getting these request timeouts.

Expected behaviour:

I would expect the intermittent errors to eventually succeed on retries.

Actual behaviour:

The requests keep failing and do not eventually work. Although on other days, this request works with no issues.

Configuration:

max pool size: 50 request timeout: 15000

Software versions

  • NodeJS: 12.22.12
  • tedious: 6.6.5
  • mssql: 6.0.1
  • node-mssql:
  • SQL Server: SQL Azure 12.0.2000.8

Blutude avatar Jul 07 '22 14:07 Blutude

You're using some old versions of MSSQL/tedious so my first recommendation is to upgrade the package and see if you still have the problem. I'm not providing support for v6 of the library as it stands.

In terms of digging deeper, I think I'll need to see some application code/query examples as that may well be fundamental to the problem.

If the errors with the queries happen repeatedly with the same query, even after application restarts, that doesn't seem intermittent but systemic to the query being run. Can you give an example?

When one runs queries directly against a server using a query editor, the set up is not always identical, there can be settings for the connections that are made which aren't set in tedious. I suspect there's something about the query that isn't playing nice (maybe unescaped quote marks, or some syntax that's not compliant unless in ANSI mode, etc).

dhensby avatar Jul 07 '22 19:07 dhensby

I tried upgrading mssql/tedious but that requires us to upgrade node, typescript and so many other packages, which is in turn even requiring a lot of code changes. I tried doing that quickly yesterday but after I saw the impact would be quite big, I decided to leave this option for last resort. We are currently in production and cannot afford to do that many upgrades at this point just resolving this P1 issue we have. If we are confident that this is the issue and that it cannot be fixed otherwise, we will go that route but I am trying to avoid this for now.

We have tried enabling an SQL Server Profiler while having this query being executed through the tedious package, and it seems that the Profiler is not even seeing these queries make it to the SQL Server. We are running the same submission with test data in a test environment multiple times and it seems that the same query sometimes succeeds and sometimes fails. When it fails, we retry for 24 hours with no success. On other days, it starts working again so I feel it is not an issue with the query. I am deploying a log to print the full query details whenever this issue happens again so we will keep an eye on it but as mentioned, our indications seem to show it is not an issue related to the query.

Something is preventing the query to even make it to the SQL Server. So I am wondering what are the possible reasons for this package to throw this RequestError "Timeout: Request failed to complete in" without even making a request to the actual database server.

Blutude avatar Jul 08 '22 18:07 Blutude

Interesting that the request isn't even making it through. I'd normally suspect that the pool was depleted, but if the requests still fail after a restart, then the pool isn't the problem (which is why I still suspect a query based problem, or possibly code).

I can't provide anything but speculation without seeing code and/or some reproduction steps.

You could turn the debug logging on (set env var DEBUG=* to see all the debug logs, which might reveal where the requests are stalling.

If you can't upgrade to the latest MSSQL version, can you upgrade to the latest v6? https://github.com/tediousjs/node-mssql/compare/v6.0.1...v6.4.1

dhensby avatar Jul 09 '22 10:07 dhensby

I do have a query that did fail. I tried running it locally through the tedious package but it was failing on a separate error, that I was using too many request input parameters. The query is a bulk insert where we insert 500 records. So I split up the query into chunks of 100 and it worked locally. However I was not able to reproduce the exact same error of the Request Timeout locally, even though I am pointing to the deployed database that was throwing this error. So this tells me the issue might not be related to the query.

I thought maybe the package is just not displaying the right error in a message, and that the true issue is that we are using too many request input parameters. So I figured we just need to split of our large bulk query in chunks.

Then this Request Timeout error happened on another separate query that is only updating a couple fields from 3 records (so not a bulk query) and I ran the exact same query locally, pointing to the same database but was not able to reproduce it. It is consistently failing on the deployed instance.

So for our next steps, I did upgrade mssql to v6.4.1 and I did add some logs on the query for whenever it fails so we can take a look at the failing query as well as the full stack trace. I'm just not confident that I would be able to reproduce it locally.

Again the queries when being run straight on the database (without going through node) execute instantly and the performance analyzation of those queries look good. Most of what we're seeing seems to indicate that there is something wrong on the Database Server (especially since when we redeploy, this timeout still happens), but when we run the profiler we do not see those requests even make it to the Server.

Blutude avatar Jul 13 '22 19:07 Blutude

@Blutude did you get to the bottom of your problem? If you're still experiencing it, it may be worth not only debugging the queries that are failing, but also the state of the pool at that point too (is it depleted? healthy? are the connections being released? etc). Also, is there a mismatch between the number of connections the pool is allowed to create and the servers maximum number of connections?

You can inspect the tarn pool directly or use some helpers on the pool: https://github.com/tediousjs/node-mssql/blob/v6.4.1/lib/base/connection-pool.js#L255-L269 and https://github.com/tediousjs/node-mssql/blob/v6.4.1/lib/base/connection-pool.js#L74-L84

dhensby avatar Aug 08 '22 10:08 dhensby

Hi, I'm having same problem here

I recently upgraded mssql from 6.3.1 to 9.0.0

"0" does not mean no-timeout anymore?

workaround of course for me was to increase requestTimeout parameter, but since I use to always close the connection manually It feels odd to me have to now deal with these settings

my config back then was:

    user: '',           // DB User
    password: '',       // DB Pasword
    server: '',         // DB Host
    database: '',       // DB Name
    port: '',           // DB Port
    requestTimeout: 0,  // Default is 30000 ms, set 0 to do not cut connection
    pool: {
        max: 100,
        min: 0,
        idleTimeoutMillis: 300000,
    }
}

MarcoEstevez avatar Aug 17 '22 14:08 MarcoEstevez

As you see I use to deal with heavy queries

image

MarcoEstevez avatar Aug 17 '22 14:08 MarcoEstevez

"0" does not mean no-timeout anymore?

Ah, that seems to be a regression from #1398 - I've authored a fix in #1421 which I'll release in patch releases to v8 and v9

dhensby avatar Aug 18 '22 14:08 dhensby

Also, @MarcoEstevez it does not appear that your problem is the same problem, your problem is that 0 is no longer accepted to disable the request timeout, the issue by OP is not that (unless you're also claiming when you run these queries via another method they execute "instantly").

As such I'm going to hide them as off-topic.

dhensby avatar Aug 18 '22 14:08 dhensby

I don't know if it's related to the same issue, but we also have request timeout while it works in direct with SQL Server Manager. We are using MSSQL 9.1.1.

When we got this issue, it blocks the whole event loop and the whole application becomes unresponsive (CPU usage rise up to 100%).

SteadEXE avatar Apr 05 '23 09:04 SteadEXE

@SteadEXE - as this issue relates to a much older version of the library, I'd recommend you open a new issue. Please can you include any debugging and where you've determined the hanging is happening. It's very unusual that the event loop would be blocked, but it would be good to get to the bottom of if it's being caused by this library.

dhensby avatar Apr 05 '23 09:04 dhensby

Sure, I'll try to investigate, and open an issue if I can get the needed informations :) thanks for the reactivity

SteadEXE avatar Apr 05 '23 10:04 SteadEXE

Hi there! I got the same behavior on my next.js 14 with mssql 10.0.2 application. Do you have any advices? Thank you!

oliveiracdz avatar Feb 08 '24 18:02 oliveiracdz

In my case the problem was that SQL Server was choosing bad plans for query execution. We fixed it forcing the plan on Query Store: https://stackoverflow.com/a/56774964/1440095

oliveiracdz avatar Feb 22 '24 15:02 oliveiracdz