public-cloud-roadmap icon indicating copy to clipboard operation
public-cloud-roadmap copied to clipboard

Monitoring ETCD quota

Open Grounz opened this issue 3 years ago • 12 comments

In some case, ETCD quota exceed, like explain here: https://docs.ovh.com/gb/en/kubernetes/etcd-quota-error/ and add new elements deployed or remove it in cluster doesn't work when this quota is reached. So on production it's mandatory to monitor this.

As kubernetes SRE team i want monitoring etcd storage quota, maybe with exporter metrics for prometheus. And we will generate alertings on this metrics and will know the exceeded quota and we will investigate without need to sollicitate ovh support teams.

Today, we use prometheus-kube-stack operator for monitoring K8S clusters, and it's work right.

Grounz avatar May 28 '21 13:05 Grounz

Hi, is that clear ?

Grounz avatar Jun 02 '21 09:06 Grounz

Hello @Grounz Yes the issue is clear. I will check with the team if there is a current way to have this information and document it here, else will add this to the backlog, this is definitely a need I agree on.

mhurtrel avatar Jun 16 '21 08:06 mhurtrel

Hi @mhurtrel and the OVHcloud team,

Do you have any news on this request?

I've also been experiencing issues with this quota being reached without any way to see it coming. It would be really nice to have a way to monitor this metric.

matheyal avatar Oct 15 '21 13:10 matheyal

Hi @matheyal and sorry for the delay on this. We had to tackle other priorities and I will come back here soon to define an ETA.

mhurtrel avatar Oct 17 '21 07:10 mhurtrel

Hello @mhurtrel our team recently experienced the same issue as @matheyal and we are looking for a solution so that our platform does not lock up with an error status again. Any assistance is appreciated.

ddelpha avatar Dec 28 '21 09:12 ddelpha

Same here Encountered twice on our cluster, and this caused downtime.

arcalys avatar Dec 28 '21 10:12 arcalys

Hi @arcalys @ddelpha @matheyal @Grounz I confirm I will get this feature prioritized first hal of 2022, but can't share a precise ETA yet

mhurtrel avatar Dec 29 '21 13:12 mhurtrel

@mhurtrel nice :)

thank you

ddelpha avatar Dec 29 '21 14:12 ddelpha

Thanks for the info @mhurtrel =)

arcalys avatar Dec 29 '21 20:12 arcalys

We have hit this issue in production as well. First half of 2022 is nearly over now. Any news / ETA? Thanks!

lilvinz avatar May 10 '22 19:05 lilvinz

Hi @lilvinz and thanks for the heads up. This feature will be released this summer, between june and august. Sorry for the delay.

mhurtrel avatar May 11 '22 06:05 mhurtrel

Hello, We planned to deploy this feature by the end of November. It takes some time to design how to give you an access to a data stored in our "management" perimeter.

More information will come

jMonsinjon avatar Sep 14 '22 13:09 jMonsinjon

You can now consult your quota and usage of etcd storage for each clustet though the API endpoint : https://api.ovh.com/console/#/cloud/project/{serviceName}/kube/{kubeId}/metrics/etcdUsage~GET

This information will be soon added the control panel, and we are exploring option to send proactive alerts to users approaching the maximum usage.

mhurtrel avatar Jan 03 '23 15:01 mhurtrel

Ok, so now we just need to write a prom exporter that retrieve these values and expose as metrics on our cluster :smile:

rverchere avatar Feb 23 '23 13:02 rverchere

Hi @mhurtrel when you said "This information will be soon added the control panel", do you have any ETA to provide us ? We want to evaluate if we implement a scrapper on the OVH API or if we are waiting for its integration into the control plane. In the second case (control plane), since we are already scrapping the ApiMetricServer embedded in Kubernetes we will collect it immediately and in the "standard" format...

fkalinowski avatar Feb 23 '23 14:02 fkalinowski

Hi @fkalinowski unfortunately i was calling the control panel the web UI (aka Manager), not the Kubernetes API. I don't have plan for a Kubernetes API integration yet, so indeed you should developp the OVHcloud Rest API scrapper

mhurtrel avatar Feb 23 '23 15:02 mhurtrel

FYI, I started a little prometheus exporter that retrieves etcd quota usage. It's at a very early but working stage.

See https://github.com/rverchere/ovh-mks-exporter

rverchere avatar Apr 12 '23 12:04 rverchere