azure-functions-kafka-extension icon indicating copy to clipboard operation
azure-functions-kafka-extension copied to clipboard

GroupCoordinator: *.*.*.*:9092: 1 request(s) timed out: disconnect with Azure Backed Service

Open TsuyoshiUshio opened this issue 5 years ago • 3 comments
trafficstars

The problem here is that Azure network loadbalancing components silently drop idle network connections after 4 minutes.

I upgrade to Confluent 1.5.2. to solve this issue, however, it still remains. It looks solved by 1.6.0-PRE3+.

https://github.com/edenhill/librdkafka/issues/3109 https://github.com/Azure/azure-functions-kafka-extension/issues/193

I can reproduce the issue with EventHubs with 5 minutes delay with KafkaTrigger. I also make sure the new version solves.

mitigation

We provide pre-release for fixing this issue. This is not the official release, however, you can test if it help to resolve your issue.

https://www.nuget.org/packages/Microsoft.Azure.WebJobs.Extensions.Kafka/3.3.1-PRE1

TsuyoshiUshio avatar Nov 20 '20 05:11 TsuyoshiUshio

Dear Tsuyoshi,

can you confirm this is really coming from the infamous idle network connection drops by Azure LBs? Have you been able to reproduce it with librdkafka 1.6.0-PRE3 or even 1.6.0-PRE4?

From reading at the librdkafka issue tracker, you might want to run the client with debug=all in order to get more detailed insights.

While I can't say for sure this is related, I am also referencing https://github.com/edenhill/librdkafka/issues/2739 and https://github.com/edenhill/librdkafka/issues/2944 here. Please investigate both issues thoroughly and check if you can make any correlations with your observations.

With kind regards, Andreas.

amotl avatar Nov 20 '20 07:11 amotl

Thank you for your comment. @amotl . I mean I reproduced with 1.5.2. 1.6.0-PRE4 looks good. How can we confirm the issue happens that you mentioned?

TsuyoshiUshio avatar Nov 20 '20 08:11 TsuyoshiUshio

Dear Tsuyoshi,

ah, I see.

Some of [our] users [tripped into] this issue. However, I can't have a confidence. How can we confirm the issue happens that you mentioned [in order to gain more confidence]?

I want to apologize that I can't contribute much to your question, with respect to pinpointing to a specific aspect. However, I tried to share more details about our environment and respective observations at https://github.com/Azure/azure-functions-kafka-extension/issues/193#issuecomment-731021811.

As outlined there, we have been approaching to mitigate this issue in a trial-and-error manner and just shared our observations.

With kind regards, Andreas.

amotl avatar Nov 20 '20 08:11 amotl