machine-controller-manager
machine-controller-manager copied to clipboard
Move drain logic into a separate controller
How to categorize this issue?
/area dev-productivity /kind enhancement /priority 3
What would you like to be added: We would like to move out the drain logic of the provider machine controller into a separate controller. With this draining logic would no longer be a part of the core machine reconcile logic It would rather be a different controller taking care of this logic. The machine controller and drain controller can use the machine CRD status/labels to coordinate the handover.
- [ ] Separate out the drain logic into a new controller.
- [ ] Switch to use of
volume attachments
resource while detaching volumes https://github.com/gardener/machine-controller-manager/pull/608/files#r644498475 - [ ] Eliminate the use of goroutines to coordinate event handlings. Preferably, convert long reconciliations with goroutines and waiting, into short reconciliations spread over time co-ordinated using some additional fields in the
status
of theMachine
resource. - [x] Handle scenarios for erratic serialized drain - https://github.com/gardener/machine-controller-manager/issues/468
- [ ] Enable
force-deletion
label to trigger force delete even if drain is ongoing. Preferably delete the VM itself
Why is this needed: To make the machine reconcile loop more efficient.
/remove internal
@prashanth26 I hope it's OK if I remove the roadmap label. With internal
we rather mean of central importance to Gardener as a whole. All teams have (team-) internal tasks, but those are not reported in the global roadmap or else it becomes too large. Things like improved-drain-of-pods-with-volumes-v2 is internal
and rightly so, because it has such a huge impact to everybody. That's kind of the differentiation. I hope, that's fine.
/remove internal @prashanth26 I hope it's OK if I remove the roadmap label. With
internal
we rather mean of central importance to Gardener as a whole. All teams have (team-) internal tasks, but those are not reported in the global roadmap or else it becomes too large. Things like improved-drain-of-pods-with-volumes-v2 isinternal
and rightly so, because it has such a huge impact to everybody. That's kind of the differentiation. I hope, that's fine.
Sure Vedran. No problem, sounds good to me.
This is also important if the machineDrainTimeout
is changed during an ongoing drain e.g in Gardener for a worker pool.
The only thing that helps is currently is to edit the machine and restart the MCM.
Another case to handle -
If a machine drain is ongoing
and a force-deletion:true
label is added on the machine obj, then force deletion of machine should happen.
Currently on shoot deletion force deletion
label is added , but it doesn't force delete the machine, if machine is already draining or stuck in draining , and thus shoot deletion is also stuck (until drain timeout occurs which could be of days)
Grooming decision: Too much refactoring needed in current code. Agree on drain being a separate controller. We should target this for controller-runtime port.