[improve][client]:Perform health checks on the endpoints that passed in by serviceUrl of PulsarClient
Main Issue: https://github.com/apache/pulsar/issues/22934
Motivation
Refer to issue: https://github.com/apache/pulsar/issues/22934
Modifications
Verifying this change
- [ ] Make sure that the change passes the CI checks.
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
- Added integration tests for end-to-end deployment with large payloads (10MB)
- Extended integration test for recovery after broker failure
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
Documentation
- [ ]
doc - [ ]
doc-required - [x]
doc-not-needed - [ ]
doc-complete
Matching PR in forked repository
PR in forked repository: https://github.com/AuroraTwinkle/pulsar/pull/4
@AuroraTwinkle Please add the following content to your PR description and select a checkbox:
- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->
@liangyepianzhou I have fixed all your comments, PTAL, thanks
Replied in https://github.com/apache/pulsar/issues/22934#issuecomment-2943690398 about a better way to solve the actual problem.
The general design is problematic since the health checking will keep running and creating TCP/IP connections that are immediately closed. This will cause additional load in the overall system, including endpoints (proxies / brokers). Additionally opening and closing a TCP/IP connection will keep the local port occupied in TIME_WAIT state for some time (2*MSL, 60s-240s depending on OS and it's config). SO_REUSEADDR/SO_REUSEPORT doesn't prevent port occupation since it doesn't help bypass TIME_WAIT restrictions for outbound client connections to the same 4-tuple (local ip, local port, remote ip, remote port).
Before actually implementing this health check feature, it would be necessary to describe the issue that is currently caused by not adding the health check and primary addressing that issue instead of implementing this solution in this PR.
Ok, I will start a new PR for a better solution that mentioned at https://github.com/apache/pulsar/issues/22934#issuecomment-2943690398. And I will close current PR.