stacks-blockchain-api icon indicating copy to clipboard operation
stacks-blockchain-api copied to clipboard

API crashes on etag calculation during connection reset

Open CharlieC3 opened this issue 2 years ago • 3 comments

Describe the bug The API crashes if a connection is terminated when an Etag is being calculated.

Logs attached: Explore-logs-05_15_2023, 11_51_32 PM.txt

API Version: 7.1.10

CharlieC3 avatar May 16 '23 03:05 CharlieC3

The Unable to calculate transaction ETag log message appears to be a red herring. The error handling paths for that are solid, and I've simulated throwing the same error there and it doesn't crash the API. It's likely some other pg interaction that isn't handling the ECONNRESET error correctly and crashing. The challenging part is that the stack traces in the error log are very short -- they don't show where in the application code this is happening:

Error: read ECONNRESET
    at TCP.onStreamRead (node:internal/stream_base_commons:217:20)
    at TCP.callbackTrampoline (node:internal/async_hooks:130:17)

Also, we do have ECONNRESET errors covered in the general postgres error handler: https://github.com/hirosystems/stacks-blockchain-api/blob/9dc72488bff40de008d4245172259185840fc670/src/datastore/helpers.ts#L241

@CharlieC3 has this happened more than once and/or are you able to reproduce? Otherwise I think we'd need to manually test by injecting this error at the pg lib level then test a bunch of calls to see if/what causes the crash.

zone117x avatar May 16 '23 13:05 zone117x

I'm also curious, could include more logs before the exit? I wonder if this lines up with the recent re-enabling of socket-io. Perhaps the bug could be in that area. I've scanned through the pg queries performed by socket-io related code and nothing immediately stood out.

zone117x avatar May 16 '23 13:05 zone117x

@zone117x According to our logs this is a repeat occurrence and is becoming more frequent. Here's one example, and one more. I do see some errors with the proxy server right before this etag error appears which I didn't notice before. It's possible this is the root cause and the etag error is a result of that.

CharlieC3 avatar May 16 '23 13:05 CharlieC3