public-cloud-roadmap
public-cloud-roadmap copied to clipboard
Monitoring ETCD quota
In some case, ETCD quota exceed, like explain here: https://docs.ovh.com/gb/en/kubernetes/etcd-quota-error/ and add new elements deployed or remove it in cluster doesn't work when this quota is reached. So on production it's mandatory to monitor this.
As kubernetes SRE team i want monitoring etcd storage quota, maybe with exporter metrics for prometheus. And we will generate alertings on this metrics and will know the exceeded quota and we will investigate without need to sollicitate ovh support teams.
Today, we use prometheus-kube-stack operator for monitoring K8S clusters, and it's work right.
Hi, is that clear ?
Hello @Grounz Yes the issue is clear. I will check with the team if there is a current way to have this information and document it here, else will add this to the backlog, this is definitely a need I agree on.
Hi @mhurtrel and the OVHcloud team,
Do you have any news on this request?
I've also been experiencing issues with this quota being reached without any way to see it coming. It would be really nice to have a way to monitor this metric.
Hi @matheyal and sorry for the delay on this. We had to tackle other priorities and I will come back here soon to define an ETA.
Hello @mhurtrel our team recently experienced the same issue as @matheyal and we are looking for a solution so that our platform does not lock up with an error status again. Any assistance is appreciated.
Same here Encountered twice on our cluster, and this caused downtime.
Hi @arcalys @ddelpha @matheyal @Grounz I confirm I will get this feature prioritized first hal of 2022, but can't share a precise ETA yet
@mhurtrel nice :)
thank you
Thanks for the info @mhurtrel =)
We have hit this issue in production as well. First half of 2022 is nearly over now. Any news / ETA? Thanks!
Hi @lilvinz and thanks for the heads up. This feature will be released this summer, between june and august. Sorry for the delay.
Hello, We planned to deploy this feature by the end of November. It takes some time to design how to give you an access to a data stored in our "management" perimeter.
More information will come
You can now consult your quota and usage of etcd storage for each clustet though the API endpoint : https://api.ovh.com/console/#/cloud/project/{serviceName}/kube/{kubeId}/metrics/etcdUsage~GET
This information will be soon added the control panel, and we are exploring option to send proactive alerts to users approaching the maximum usage.
Ok, so now we just need to write a prom exporter that retrieve these values and expose as metrics on our cluster :smile:
Hi @mhurtrel when you said "This information will be soon added the control panel", do you have any ETA to provide us ? We want to evaluate if we implement a scrapper on the OVH API or if we are waiting for its integration into the control plane. In the second case (control plane), since we are already scrapping the ApiMetricServer embedded in Kubernetes we will collect it immediately and in the "standard" format...
Hi @fkalinowski unfortunately i was calling the control panel the web UI (aka Manager), not the Kubernetes API. I don't have plan for a Kubernetes API integration yet, so indeed you should developp the OVHcloud Rest API scrapper
FYI, I started a little prometheus exporter that retrieves etcd quota usage. It's at a very early but working stage.
See https://github.com/rverchere/ovh-mks-exporter