machine-controller-manager
machine-controller-manager copied to clipboard
Switch to exponential backoff while creating/deletion machines
What would you like to be added: On failure of machine creation or deletion requests, MCM constantly tries to create or delete the machine-objects. This could cause a heavy load on control-cluster's API-server, and exhaust the API rate-limits of cloud-provider. We should exponentially back-off on the failure of requests.
Why is this needed:
/assign @hardikdr @prashanth26 /priority blocker
/priority normal We implemented the constant backoff here #525. We should consider looking at a more sophisticated exponential backoff mechanism, a proposal would be nice. I mainly see 2 options,
- Backoff at the queue. An attempt to machine-set queue: https://github.com/gardener/machine-controller-manager/pull/510
- Backoff inside the reconcile function.
- Maybe something similar to https://github.com/gardener/autoscaler/tree/machine-controller-manager-provider/cluster-autoscaler/utils/backoff .
cc @zuzzas
Thanks to https://github.com/gardener/machine-controller-manager/pull/525 we can now attach a RateLimitingInterface to the queue, and throttle Machines in CrashLoopBackoff.
- I'd take the
backoff_manager
concept from here. - Create a throttling-by-CrashLoopBackoff function here.
- And attach the resulting RateLimitingInterface to the queue here.
Then, there's a matter of replacing Add
s with AddRateLimited
s to ensure that our new RateLimiter is being triggered.
/title Switch to exponential backoff while creating/deletion machines