mimir icon indicating copy to clipboard operation
mimir copied to clipboard

Meta-monitoring grafana agent crashing: no such file or directory

Open nosalan opened this issue 11 months ago • 1 comments

Describe the bug

A statefulset pod named mimir-meta-monitoring-0, which is probably created by the grafana agent operator, is crashing with the following error: error loading config file /var/lib/grafana-agent/config/agent.yml: error reading config file open /var/lib/grafana-agent/config/agent.yml: no such file or directory

Other pods are running: image

To Reproduce

Steps to reproduce the behavior:

  1. Installed the latest helm release mimir-distributed-5.2.1
  2. Configured the meta monitoring in helm chart:
  metaMonitoring:
    serviceMonitor:
      enabled: true
    grafanaAgent:
      enabled: true
      installOperator: true
      logs:
        enabled: false
      metrics:
        enabled: true
      imageRepo:
        # Due to bad mimir chart design, I had to put whole configs below in order to overwrite JUST the container registry.
        configReloader:
          repo: quay.io # updated at runtime
          image: prometheus-operator/prometheus-config-reloader
          tag: v0.47.0
        grafanaAgent:
          repo: docker.io # updated at runtime
          image: grafana/agent
          tag: v0.29.0

other helm values are pretty standard.

Expected behavior

Service is running, mimir charts are working

Environment

  • Infrastructure: Kubernetes, bare-metal, laptop
  • Deployment tool: helm

nosalan avatar Mar 04 '24 15:03 nosalan

I didn't manage to reproduce this, using exact values from the description

› helm install mmr grafana/mimir-distributed --version 5.2.1 --values 7531-values.yaml --namespace test-monitoring
···

› kubectl -n test-monitoring get pod/mmr-mimir-meta-monitoring-0
NAME                          READY   STATUS    RESTARTS   AGE
mmr-mimir-meta-monitoring-0   2/2     Running   0          16m

› kubectl -n test-monitoring exec pod/mmr-mimir-meta-monitoring-0 -ti -- ls -lR /var/lib/grafana-agent
/var/lib/grafana-agent:
total 8
drwxrwxrwx 2 root root 4096 Mar  7 10:38 config
drwxrwxrwt 3 root root  100 Mar  7 10:22 config-in
drwxrwxrwx 3 root root 4096 Mar  7 10:22 data
drwxrwxrwt 3 root root   80 Mar  7 10:22 secrets

/var/lib/grafana-agent/config:
total 44
-rw-r--r-- 1 root root 42511 Mar  7 10:38 agent.yml

/var/lib/grafana-agent/config-in:
total 0
lrwxrwxrwx 1 root root 16 Mar  7 10:22 agent.yml -> ..data/agent.yml
···

Note above, mmr-mimir-meta-monitoring-0 pod is running normally, and has the expected config in /var/lib/grafana-agent/config/agent.yml.

Could you share more details about the issue: are there any more errors logged in the agent's mimir-meta-monitoring-0 pod?

narqo avatar Mar 07 '24 10:03 narqo

closing due to not being able to reproduce; feel free to reopen with more details

dimitarvdimitrov avatar Apr 03 '24 15:04 dimitarvdimitrov