go-mssqldb icon indicating copy to clipboard operation
go-mssqldb copied to clipboard

Bad connection is never removed from the connection pool

Open dududko opened this issue 3 years ago • 1 comments

Describe the bug

There is a stored procedure which is executed every 5 seconds. The stored procedure returns the result of the join of two tables. Data is stored in one database on a single server. It is NOT a distributed query. It is a read only query.

Once in a blue moon(once a day) something bad happen, and my program break. After that on each request I see the recurrent error:

"error": "mssql: New request is not allowed to start because it should come with valid transaction descriptor."

The thing is, that I get this error only on Commit, which mean that the connection is established, I even can scan the results, but on commit I receive this error. So it looks like the bad connection is stuck in the pool, and the driver does not recognize that the connection is broken.

Digging into the problem

I added sql hooks, increased the logging level, and filled my code with bunch of log statements. Here is what I found:

  1. after the transaction is open and before it is closed driver prints the error (note: ip addresses is removed)
Intercepted panic read tcp *:46326->*:1433: i/o timeout
  1. after that the row scan returns the error
"logger": "sqlhook.Rows.Next"
"query": "exec myStoredProcedure",
"error": "unknown token type returned: token(0)"
  1. the transaction is cancelled
"The transaction active in this session has been committed or aborted by another session."
"ROLLBACK TRANSACTION 5100000d66\n""The transaction active in this session has been committed or aborted by another session."
"ROLLBACK TRANSACTION 5100000d66\n"
  1. then after 5 seconds the procedure is called again, and now it fails on commit
    "msg": "detected mssql error",
    "message": "The server failed to resume the transaction. Desc:3a0000005d.",
    "number": 3971,
    "state": 1,
    "class": 16,
    "server_name": "MY_SERVER",
    "line_no": 1,
  1. Then I see the recurrent error
"New request is not allowed to start because it should come with valid transaction descriptor."v

Conclusions

I reviewed the code of the driver and I think that there is no proper error handling in the row scan logic. Errors are mostly ignored. Also there is no way to tell the driver which errors are unrecoverable. The only way to do this is to patch the driver.

I was able to provide the patch for the error (5), but ideally it would be great to find the source of error (1) and remove the broken connection from the pool.

Further technical details

Microsoft SQL Server 2012 - 11.0.2100.60 (X64)

dududko avatar Sep 28 '20 19:09 dududko

Is it possible to reproduce this error?

denisenkom avatar Nov 03 '20 05:11 denisenkom