etcd-backup-restore
etcd-backup-restore copied to clipboard
Do not check SSR State when Snapshotter should be stopped
What this PR does / why we need it:
As reported in #676 we sometimes see that multiple snapshotters can be running at one time in a cluster.
I believe that the reason for this is that the snapshotter is not set to active state until an initial delta snapshot or a fullsnapshot has been taken. If a leadership election takes place while the snapshot is being taken and the current leader loses that election, then handleSsrRequest does not send on the ssrStopCh. This in turn causes ssr.snapshotEventHandler to never return and snapshots would continue to be taken.
Secondly I'm propsing a change to close the ssrStopCh
rather than sending on it. This is to make it clear to the reader that the channel is no longer being used at this point.
Lastly the function signature changes are simply adding further type annotations to the channels passed to functions to help readers understand where channels are being read from/sent to.
Which issue(s) this PR fixes: Fixes #676
Special notes for your reviewer:
The reason why the ssrState was being checked is not apparent to me so I'd be delighted to be told why this might be an awful idea.
Release note:
Do not rely on the snapshotter state when stopping the snapshotter. The snapshotter will now always be closed when a member goes from being the leader to any other state.
@avestuk Thank you for your contribution.
Thank you @avestuk for your contribution. Before I can start building your PR, a member of the organization must set the required label(s) {'reviewed/ok-to-test'}. Once started, you can check the build status in the PR checks section below.
@ishan16696 Anything I can do to help move this along?
Hi @avestuk , Sorry for delay. I recognise the importance of the issue, I will try to increase the priority of the issue mentioned by you and analyse it as this PR involves modifications to our legacy code, it warrants a thorough analysis.
Thanks @avestuk for PR, Overall looks good. I'm just testing out some edge cases. Kindly requesting you to rebase this PR on latest master.
As requested in this comment: https://github.com/gardener/etcd-backup-restore/pull/680#issuecomment-2008622838, please rebase this PR on latest master.
As requested in this comment: #680 (comment), please rebase this PR on latest master.
Thanks very much @ishan16696 I've rebased on latest master.
/hold until #716 is merged.
/hold until https://github.com/gardener/etcd-backup-restore/pull/716 is merged.
PR #716 will take some time as it also need to resolve merge conflicts, let's not block this PR. Talked to @shreyas-s-rao and he is fine with it.