librdkafka icon indicating copy to clipboard operation
librdkafka copied to clipboard

Fix to remove fetch queue messages that blocked the destroy of rdkafka instances

Open emasab opened this issue 1 year ago • 5 comments
trafficstars

Circular dependencies from a partition fetch queue message to the same partition blocked the destroy of an instance, that happened in case the partition was removed from the cluster while it was being consumed. Solved by purging internal partition queue, after being stopped and removed, to allow reference count to reach zero and trigger a destroy

emasab avatar May 21 '24 08:05 emasab

How to reproduce: happening sporadically with test 0113 subtest n_wildcard. Run it it with TEST_DEBUG=all and until-fail.sh to see the refcnt not reaching zero.

emasab avatar May 21 '24 08:05 emasab

FWIW I've confirmed that this branch also fixes a non-constant but somewhat frequent issue I've been observing. I reproduced it by stressing the client node's swap while also stressing the broker node's cpu. It took a few restart cycles but within several hours the deadlocked destroy call occurred.

zuellig avatar Aug 26 '24 19:08 zuellig

Is there any expected date on this to be merged?

antaljanosbenjamin avatar Sep 11 '24 17:09 antaljanosbenjamin

:tada: All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

All Contributor License Agreements have been signed. Ready to merge.

One step closer! :tada:

antaljanosbenjamin avatar Sep 25 '24 14:09 antaljanosbenjamin

Addressed comment, updated CHANGELOG and rebased

emasab avatar Oct 29 '24 14:10 emasab

Hi, is there an estimated time to have this fix in a release? I have got the same issue when closing kafka consumer https://github.com/confluentinc/librdkafka/discussions/4885.

ydsun90 avatar Oct 30 '24 09:10 ydsun90

Hi, is there an estimated time to have this fix in a release? I have got the same issue when closing kafka consumer #4885.

Any plan to release the fix in a new version?

ydsun90 avatar Nov 05 '24 18:11 ydsun90

Hi, is there an estimated time to have this fix in a release? I have got the same issue when closing kafka consumer #4885.

Any plan to release the fix in a new version?

@pranavrth @emasab Hi, is there any expected time to have this fix in a new release?

ydsun90 avatar Nov 13 '24 10:11 ydsun90

Can it be related to https://github.com/confluentinc/librdkafka/issues/4362 ?

filimonov avatar Feb 21 '25 23:02 filimonov