kafka icon indicating copy to clipboard operation
kafka copied to clipboard

[WIP] KAFKA-6589 Extract Heartbeat thread from AbstractCoordinator

Open smurakozi opened this issue 7 years ago • 7 comments

AbstractCoordinator interacts with the thread via a HeartbeatThreadManager. The existing behavior was extracted to an abstract class + some glue code in the AbstractCoordinator.

Testing was done using the existing ConsumerCoordinatorTest and WorkerCoordinatorTest.

Committer Checklist (excluded from commit message)

  • [ ] Verify design and implementation
  • [ ] Verify test coverage and CI build status
  • [ ] Verify documentation (including upgrade notes)

smurakozi avatar Mar 05 '18 15:03 smurakozi

@guozhangwang, @hachikuji could you please check if you think it's the right direction.

The behavior of AbstractCoordinator and the heartbeat thread were quite coupled with lots of interaction back and forth. I tried to ensure that

  • I introduce no semantical changes, especially not related to concurrency (hard to find)
  • the thread and its manager gets as independent from the coordinator as possible and especially that there are no cyclic dependencies.

A couple of weak/not so elegant points of the current state of this PR:

  • shared lock between the two classes - I think it would be pretty hard to avoid this without some major semantical changes as the lock is used not only for ensuring proper state handling but also to suspend/continue execution of threads.
  • heartbeatThreadManager cannot be injected in the constructor. Having two constructors, one calling the other with a default implementation does not work: the AbstractHeartbeatThreadManager instance will need members of AbstractCoordinator (so it can't be static or created in a static context), and instance methods can't be called before this(). Having a setter for heartbeatThreadManager is not nice, but it would enable injection of completely different logic or another subclass of AbstractHeartbeatThreadManager with some customization.

Please let me know if you would prefer a different way to extract the thread or if you have ideas regarding the above-mentioned problems / have any other suggestions.

smurakozi avatar Mar 05 '18 16:03 smurakozi

test failure was caused by java.lang.OutOfMemoryError: unable to create new native thread

smurakozi avatar Mar 06 '18 12:03 smurakozi

retest this, please

smurakozi avatar Mar 06 '18 12:03 smurakozi

@guozhangwang, @hachikuji could you please let me know your opinion about this change?

smurakozi avatar Mar 13 '18 07:03 smurakozi

@smurakozi Sorry for the delay. I had a quick look and I think it looks promising. Maybe the only thing I'd mention is that the patch sort of hides the fact that AbstractCoordinator and the heartbeat thread are sharing the same lock. In other words, the coupling between the two classes is still quite tight even though we've added this layer in between. Given that, I was wondering if it would make sense to leave the heartbeat thread itself nested inside AbstractCoordinator so that the coupling is clearer? We could still have the HeartbeatThreadManager abstraction passed through the constructor so that it could be mocked in testing.

hachikuji avatar Mar 13 '18 14:03 hachikuji

I was not closely looking into the lock coupling when created the JIRA, but thinking about that again I'm +1 to @hachikuji 's comments. The key here is as long as we can mock the HeartbeatThreadManager in our unit tests to get rid of any timing related issues that is good enough.

guozhangwang avatar Mar 13 '18 16:03 guozhangwang

This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please leave a comment asking for a review. If the PR has merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

github-actions[bot] avatar Oct 21 '24 03:10 github-actions[bot]

This PR has been closed since it has not had any activity in 120 days. If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open the PR and ask for a review.

github-actions[bot] avatar Nov 20 '24 03:11 github-actions[bot]