kafka
kafka copied to clipboard
KAFKA-14225: Fix deadlock caused by lazy val exemptSensor
There is a chance to cause deadlock when multiple threads access ClientRequestQuotaManager. In the version Scala 2.12, the lazy val initialization is under the object lock. The deadlock could happen in the following condition:
In thread a, when ClientRequestQuotaManager.exemptSensor is being initialized, it has acquired the object lock and enters the the actual initialization block. The initialization of 'exemptSensor' requires another lock private val lock = new ReentrantReadWriteLock() and it is waiting for this lock.
In thread b, at the same time, ClientQuotaManager.updateQuota() is called and it has already acquired ReentrantReadWriteLock lock by calling lock.writeLock().lock(). And then it executes info(). If this is the first time accessing Logging.logger, which is also a lazy val, it need to wait for the object lock.
The deadlock happens.
Since the lazy val initialization is under the object lock, we should avoid using lazy val if the initialization function holds another lock to prevent holding two locks at the same time which is prone for deadlock. Change to create exemptSensor during ClientRequestQuotaManager initialization with an expiration time of Long.MaxValue to prevent expiration if request quota is not enabled at that time.
Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
rerun tests please
@dajac @rajinisivaram could you please review this PR? Thanks! (And I am wondering whether you know how to rerun the checks/tests without submitting any changes. This PR currently fails a test, which I do not think is related to the PR's change.)
@hshi2022 Thanks for the patch. Did you catch this deadlock in practice or did you use some tool to find it?
@hshi2022 Thanks for the patch. Did you catch this deadlock in practice or did you use some tool to find it?
We caught in practice.
@hachikuji Thanks for review. I have addressed your comments. And I am wondering whether you know how to rerun the checks/tests without submitting any changes. This PR currently fails a test, which I do not think is related to the PR's change.)
@hshi2022 Yeah, we are often fighting some flaky tests. It should not block merging if they are unrelated. Can you take a look at the response above https://github.com/apache/kafka/pull/12634#discussion_r982935129? Thanks!
@hshi2022 Yeah, we are often fighting some flaky tests. It should not block merging if they are unrelated. Can you take a look at the response above #12634 (comment)? Thanks!
Replied under that comment.
@hachikuji Do you have any more questions/comments regarding this PR?
Kicked off one more build since it's been a little while. I will merge assuming nothing wrong with the build. Thanks for the patch!