machine-controller
machine-controller copied to clipboard
Add a metric to expose non-ready machines or errors
Use-case:
We are deploying machines on Hetzner, and sometimes its not possible to create the machine due to account limit on resources:
machine_controller.go:383] Failed to reconcile machine "xxx-m-1-68c6cd6957-6hk94": failed to create machine at cloudprovider, due to failed to create server, due to core limit exceeded (resource_limit_exceeded)
It would be very useful to have a metric to monitor for this, and be able to have an alert when machines have been scheduler but are not successfully created.