API crashes on etag calculation during connection reset
Describe the bug The API crashes if a connection is terminated when an Etag is being calculated.
Logs attached: Explore-logs-05_15_2023, 11_51_32 PM.txt
API Version: 7.1.10
The Unable to calculate transaction ETag log message appears to be a red herring. The error handling paths for that are solid, and I've simulated throwing the same error there and it doesn't crash the API. It's likely some other pg interaction that isn't handling the ECONNRESET error correctly and crashing. The challenging part is that the stack traces in the error log are very short -- they don't show where in the application code this is happening:
Error: read ECONNRESET
at TCP.onStreamRead (node:internal/stream_base_commons:217:20)
at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
Also, we do have ECONNRESET errors covered in the general postgres error handler:
https://github.com/hirosystems/stacks-blockchain-api/blob/9dc72488bff40de008d4245172259185840fc670/src/datastore/helpers.ts#L241
@CharlieC3 has this happened more than once and/or are you able to reproduce? Otherwise I think we'd need to manually test by injecting this error at the pg lib level then test a bunch of calls to see if/what causes the crash.
I'm also curious, could include more logs before the exit? I wonder if this lines up with the recent re-enabling of socket-io. Perhaps the bug could be in that area. I've scanned through the pg queries performed by socket-io related code and nothing immediately stood out.
@zone117x According to our logs this is a repeat occurrence and is becoming more frequent. Here's one example, and one more.
I do see some errors with the proxy server right before this etag error appears which I didn't notice before. It's possible this is the root cause and the etag error is a result of that.