machine-controller icon indicating copy to clipboard operation
machine-controller copied to clipboard

Add a metric to expose non-ready machines or errors

Open renchap opened this issue 2 years ago • 0 comments

Use-case:

We are deploying machines on Hetzner, and sometimes its not possible to create the machine due to account limit on resources:

 machine_controller.go:383] Failed to reconcile machine "xxx-m-1-68c6cd6957-6hk94": failed to create machine at cloudprovider, due to failed to create server, due to core limit exceeded (resource_limit_exceeded)    

It would be very useful to have a metric to monitor for this, and be able to have an alert when machines have been scheduler but are not successfully created.

renchap avatar Feb 02 '23 13:02 renchap