librdkafka Fix to remove fetch queue messages that blocked the destroy of rdkafka instances

Fix to remove fetch queue messages that blocked the destroy of rdkafka instances

Open emasab opened this issue 1 year ago • 5 comments

trafficstars

Circular dependencies from a partition fetch queue message to the same partition blocked the destroy of an instance, that happened in case the partition was removed from the cluster while it was being consumed. Solved by purging internal partition queue, after being stopped and removed, to allow reference count to reach zero and trigger a destroy

May 21 '24 08:05 emasab

How to reproduce: happening sporadically with test 0113 subtest n_wildcard. Run it it with TEST_DEBUG=all and until-fail.sh to see the refcnt not reaching zero.

May 21 '24 08:05 emasab

FWIW I've confirmed that this branch also fixes a non-constant but somewhat frequent issue I've been observing. I reproduced it by stressing the client node's swap while also stressing the broker node's cpu. It took a few restart cycles but within several hours the deadlocked destroy call occurred.

Aug 26 '24 19:08 zuellig

Is there any expected date on this to be merged?

Sep 11 '24 17:09 antaljanosbenjamin

:tada: All Contributor License Agreements have been signed. Ready to merge.
_{Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.}

Sep 24 '24 13:09 confluent-cla-assistant[bot]

All Contributor License Agreements have been signed. Ready to merge.

One step closer! :tada:

Sep 25 '24 14:09 antaljanosbenjamin

Addressed comment, updated CHANGELOG and rebased

Oct 29 '24 14:10 emasab

Hi, is there an estimated time to have this fix in a release? I have got the same issue when closing kafka consumer https://github.com/confluentinc/librdkafka/discussions/4885.

Oct 30 '24 09:10 ydsun90

Hi, is there an estimated time to have this fix in a release? I have got the same issue when closing kafka consumer #4885.

Any plan to release the fix in a new version?

Nov 05 '24 18:11 ydsun90

Hi, is there an estimated time to have this fix in a release? I have got the same issue when closing kafka consumer #4885.

Any plan to release the fix in a new version?

@pranavrth @emasab Hi, is there any expected time to have this fix in a new release?

Nov 13 '24 10:11 ydsun90

Can it be related to https://github.com/confluentinc/librdkafka/issues/4362 ?

Feb 21 '25 23:02 filimonov

librdkafka librdkafka copied to clipboard

Fix to remove fetch queue messages that blocked the destroy of rdkafka instances

librdkafka
librdkafka copied to clipboard