kafkajs icon indicating copy to clipboard operation
kafkajs copied to clipboard

consumer stopped consuming after ERROR [Connection] Connection error: read ECONNRESET

Open AhmedLadhar opened this issue 2 years ago • 1 comments

Describe the bug We re using kafkaJs to consume and produce events to azure event hub with kafka API. And we noticed that all of our pods on AKS had stopped consuming after throwing this list of errors :

ERROR [Connection] Response Fetch(key: 1, version: 6) {"timestamp":"2022-03-17T07:47:18.361Z","logger":"kafkajs","broker":"ESUT1EVENTHUBS01.servicebus.windows.net:9093","clientId":"microservices/claim/consumer","error":"The server experienced an unexpected error when processing the request","correlationId":8074,"size":217} ERROR [Consumer] Crash: KafkaJSNonRetriableError: The server experienced an unexpected error when processing the request {"timestamp":"2022-03-17T07:47:18.362Z","logger":"kafkajs","groupId":"claim-service-group","stack":"KafkaJSNonRetriableError: The server experienced an unexpected error when processing the request\n at /home/node/app/node_modules/kafkajs/src/retry/index.js:53:18\n at runMicrotasks ()\n at processTicksAndRejections (internal/process/task_queues.js:97:5)"} ERROR [Connection] Connection error: read ECONNRESET {"timestamp":"2022-03-17T07:47:18.368Z","logger":"kafkajs","broker":"ESUT1EVENTHUBS01.servicebus.windows.net:9093","clientId":"microservices/claim/consumer","stack":"Error: read ECONNRESET\n at TLSWrap.onStreamRead (internal/stream_base_commons.js:209:20)\n at TLSWrap.callbackTrampoline (internal/async_hooks.js:126:14)"}

I was wondering if i should caught "read ECONNRESET" error and restart the client in the client code or there is something i could leverage in the library. I know the library usually restart before given up on transient error and eventually throw EXONNRSET when it see KafkaJSNumberOfRetriesExceeded Expected behavior I expect the library to initiate a restart.

Observed behavior After raising the error the client did not crush and stayed without any activity (no message where consumed).

Environment:

OS: alpine:0.11.0 KafkaJS version: 1.15.0 Kafka version: Azure Event Hub PAAS node:12.22.10-alpine3.15

AhmedLadhar avatar Mar 17 '22 19:03 AhmedLadhar

We saw the same thing on v1.16.0 and were initially listening for the consumer.crash event to start a re-connect. However we also saw inconsistency with this event firing reliably, sometimes seeing errors like "Failed to execute listener". We're now listening for the consumer.heartbeat event to check if it is healthy, and it has been working quite well.

mguay22 avatar Apr 29 '22 20:04 mguay22

Any update?

@mguay22 Below is the event I am getting in heartbeat. How can we say whether the connection is healthy or not?

{
  id: 2,
  type: 'consumer.heartbeat',
  timestamp: 1685466252785,
  payload: {
    groupId: '<group-id>,
    memberId: '<member-id>',
    groupGenerationId: 1
  }
}

Vijay-Nirmal avatar May 30 '23 17:05 Vijay-Nirmal

encounter with the same issue any update on the same. i am using 2.2.4 kafkajs

biranjanemids avatar Jul 27 '23 15:07 biranjanemids