prometheus-k8s-operator icon indicating copy to clipboard operation
prometheus-k8s-operator copied to clipboard

Prometheus Metrics stop working after some time (with solution)

Open QthePirate opened this issue 1 year ago • 2 comments

Bug Description

After running the COS-Lite stack on MicroK8s for some time I would notice that my dashboard (which at this time only contained ceph and machine metrics from grafana-agent) would stop displaying data. It took me a bit to figure out what was going on until I realized I should look at the prometheus storage space.

According to the default config: maximum_retention_size is 80%.

This led me to look at the default size for the Prometheus PVC, which was 1G.

I solved this continuing issue by manually increasing the Prometheus PVC size in K8s via kubectl -n cos edit pvc

This is just a workaround. I would recommend either:

  1. Increasing the default size on deployment

  2. Mention that this needs to increase in documentation (including here: https://charmhub.io/topics/canonical-observability-stack/tutorials/install-microk8s)

Even in my small lab environment running this, 1G is absolutely not enough space for a continually running Prometheus instance.

To Reproduce

juju deploy cos-lite

juju integrate prometheus:metrics-endpoint

Environment

Prometheus-k8s Channel: latest/stable Rev: 189 MicroK8s v1.30.3 revision 7040 Juju 3.5.3 (Found on 3.5.2)

Relevant log output

There we're no logs able to be found that were relevant to the issue.

Additional context

No response

QthePirate avatar Aug 15 '24 00:08 QthePirate

1Gi is a default value, and is as arbitrary as 20Gi. If the admin is not aware of this limit then they'd encounter the "storage full" problem eventually anyway. We'd be just delaying the problem.

But you're making a good point about mentioning it in the doc in addition to the bit about the overlay.

sed-i avatar Aug 15 '24 13:08 sed-i

@sed-i You're right, it is arbitrary.

The other thing that could help would be useful log info. There was nothing in the logs that indicated that this was why my data stopped showing up. I made a guess.

QthePirate avatar Aug 15 '24 17:08 QthePirate