kcp
kcp copied to clipboard
:sparkles: Fix leader election issue with workspace controller and other KCP Controllers
This PR addresses the following,
Change the workspace controllers start logic inside runners to fix leader election issue. The way we register controllers and define the runner is problematic, the runner calls start only. but in case leader election is lost start finishes (as it was waiting on <- ctx.Done()) which leads to the defer on the queue.Shutdown() to run. Once you shutdown a queue, there’s no way to restart it
Background: At times we faced workspace controller creation stuck at scheduling phase and never recovers. Regarding leader election the requests/events queued to both leader and other pods aswell , this makes the queue depth to grow.
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Hi @sankar17. Thanks for your PR.
I'm waiting for a kcp-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/ok-to-test
/retest
/test pull-kcp-test-e2e
/retest
@sankar17 @ramramu3433 these failures across the e2e test board don't seem like flakes to me. Does make test-e2e
work locally for you for this branch?
/retest
@sankar17 @ramramu3433 these failures across the e2e test board don't seem like flakes to me. Does
make test-e2e
work locally for you for this branch?
I will test and udpate
@sankar17 Please consider not running re-tests when tests are failing consistently, at least not without any code changes pushed. Those tests burn CI cycles without any real reason, we already know that they don't work.
@sankar17 Please consider not running re-tests when tests are failing consistently, at least not without any code changes pushed. Those tests burn CI cycles without any real reason, we already know that they don't work.
Sure I will make sure it works in local and do retest
Thanks!
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from embik. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
/test pull-kcp-verify
/retest-required
@sankar17: The following tests failed, say /retest
to rerun all failed tests or /retest-required
to rerun all mandatory failed tests:
Test name | Commit | Details | Required | Rerun command |
---|---|---|---|---|
pull-kcp-verify | eb6571d636a45a85137ee893c159a2b219cb6ec1 | link | true | /test pull-kcp-verify |
pull-kcp-verify-codegen | eb6571d636a45a85137ee893c159a2b219cb6ec1 | link | true | /test pull-kcp-verify-codegen |
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
This is implemented with alternative approach https://github.com/kcp-dev/kcp/pull/3132 , hence this PR is no longer needed