librdkafka
librdkafka copied to clipboard
Fix to remove fetch queue messages that blocked the destroy of rdkafka instances
Circular dependencies from a partition fetch queue message to the same partition blocked the destroy of an instance, that happened in case the partition was removed from the cluster while it was being consumed. Solved by purging internal partition queue, after being stopped and removed, to allow reference count to reach zero and trigger a destroy
How to reproduce: happening sporadically with test 0113 subtest n_wildcard. Run it it with TEST_DEBUG=all and until-fail.sh to see the refcnt not reaching zero.
FWIW I've confirmed that this branch also fixes a non-constant but somewhat frequent issue I've been observing. I reproduced it by stressing the client node's swap while also stressing the broker node's cpu. It took a few restart cycles but within several hours the deadlocked destroy call occurred.
Is there any expected date on this to be merged?
:tada: All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.
All Contributor License Agreements have been signed. Ready to merge.
One step closer! :tada:
Addressed comment, updated CHANGELOG and rebased
Hi, is there an estimated time to have this fix in a release? I have got the same issue when closing kafka consumer https://github.com/confluentinc/librdkafka/discussions/4885.
Hi, is there an estimated time to have this fix in a release? I have got the same issue when closing kafka consumer #4885.
Any plan to release the fix in a new version?
Hi, is there an estimated time to have this fix in a release? I have got the same issue when closing kafka consumer #4885.
Any plan to release the fix in a new version?
@pranavrth @emasab Hi, is there any expected time to have this fix in a new release?
Can it be related to https://github.com/confluentinc/librdkafka/issues/4362 ?