agent icon indicating copy to clipboard operation
agent copied to clipboard

GrafanaAgent API does not allow to limit resources of configReloader pods

Open repjak opened this issue 2 years ago • 6 comments

What's wrong?

When grafana agent operator is deployed by a Helm chart into a namespace with a quota, the pods created by the operator fail the quota as no resources are applied to the config reloader pod spec.

The API should:

  1. provide the option to specify the resources for configReloader pod, or
  2. apply resources consistently to all pods created by the agent

Steps to reproduce

The error can be reproduced with loki helm chart, which creates a GrafanaAgent instance.

  1. Deploy grafana-agent-operator Helm chart into a namespace with ResourceQuota
  2. Deploy loki into the same namespace
  3. Daemonset loki-logs created by the grafana agent operator fails quota

System information

OVH Managed Kubernetes Service, kubernetes version 1.25.12-3
Helm charts:
- grafana/grafana-agent-operator, version: ^0.3.11
- grafana/loki,  version: ^5.39.0
Helm version v3.13.2
Agent operator: docker.io/grafana/agent-operator:v0.37.4

Logs

$ kubectl describe daemonset loki-logs -n monitoring
Name:           loki-logs
Selector:       app.kubernetes.io/instance=loki,app.kubernetes.io/managed-by=grafana-agent-operator,app.kubernetes.io/name=grafana-agent,grafana-agent=loki,operator.agent.grafana.com/name=loki,operator.agent.grafana.com/type=logs
Node-Selector:  <none>
Labels:         app.kubernetes.io/instance=loki
                app.kubernetes.io/managed-by=grafana-agent-operator
                app.kubernetes.io/name=grafana-agent
                grafana-agent=loki
                operator.agent.grafana.com/name=loki
                operator.agent.grafana.com/type=logs
Annotations:    deprecated.daemonset.template.generation: 1
                meta.helm.sh/release-name: loki
                meta.helm.sh/release-namespace: monitoring
Desired Number of Nodes Scheduled: 4
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/instance=loki
                    app.kubernetes.io/managed-by=grafana-agent-operator
                    app.kubernetes.io/name=grafana-agent
                    app.kubernetes.io/version=v0-37-4
                    grafana-agent=loki
                    operator.agent.grafana.com/name=loki
                    operator.agent.grafana.com/type=logs
  Annotations:      kubectl.kubernetes.io/default-container: grafana-agent
  Service Account:  loki-grafana-agent
  Containers:
   config-reloader:
    Image:      quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
    Port:       <none>
    Host Port:  <none>
    Args:
      --config-file=/var/lib/grafana-agent/config-in/agent.yml
      --config-envsubst-file=/var/lib/grafana-agent/config/agent.yml
      --watch-interval=1m
      --statefulset-ordinal-from-envvar=POD_NAME
      --reload-url=http://127.0.0.1:8080/-/reload
    Environment:
      POD_NAME:   (v1:metadata.name)
      HOSTNAME:   (v1:spec.nodeName)
      SHARD:     0
    Mounts:
      /var/lib/docker/containers from dockerlogs (ro)
      /var/lib/grafana-agent/config from config-out (rw)
      /var/lib/grafana-agent/config-in from config (ro)
      /var/lib/grafana-agent/data from data (rw)
      /var/lib/grafana-agent/secrets from secrets (ro)
      /var/log from varlog (ro)
   grafana-agent:
    Image:      grafana/agent:v0.37.4
    Port:       8080/TCP
    Host Port:  0/TCP
    Args:
      -config.file=/var/lib/grafana-agent/config/agent.yml
      -config.expand-env=true
      -server.http.address=0.0.0.0:8080
      -enable-features=integrations-next
    Limits:
      cpu:     50m
      memory:  64M
    Requests:
      cpu:      10m
      memory:   32M
    Readiness:  http-get http://:http-metrics/-/ready delay=0s timeout=3s period=5s #success=1 #failure=120
    Environment:
      POD_NAME:   (v1:metadata.name)
      HOSTNAME:   (v1:spec.nodeName)
      SHARD:     0
    Mounts:
      /var/lib/docker/containers from dockerlogs (ro)
      /var/lib/grafana-agent/config from config-out (rw)
      /var/lib/grafana-agent/config-in from config (ro)
      /var/lib/grafana-agent/data from data (rw)
      /var/lib/grafana-agent/secrets from secrets (ro)
      /var/log from varlog (ro)
  Volumes:
   config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  loki-logs-config
    Optional:    false
   config-out:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
   secrets:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  loki-secrets
    Optional:    false
   varlog:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log
    HostPathType:  
   dockerlogs:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/docker/containers
    HostPathType:  
   data:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/grafana-agent/data
    HostPathType:  
Events:
  Type     Reason        Age                From                  Message
  ----     ------        ----               ----                  -------
  Warning  FailedCreate  28m                daemonset-controller  Error creating: pods "loki-logs-xhtj9" is forbidden: failed quota: monitoring.quota: must specify limits.cpu for: config-reloader; limits.memory for: config-reloader; requests.cpu for: config-reloader; requests.memory for: config-reloader

repjak avatar Dec 01 '23 10:12 repjak

Adding output from $ kubectl -n monitoring describe grafanaagent loki:

Name:         loki
Namespace:    monitoring
Labels:       app.kubernetes.io/instance=loki
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=loki
              app.kubernetes.io/version=2.9.2
              helm.sh/chart=loki-5.39.0
Annotations:  meta.helm.sh/release-name: loki
              meta.helm.sh/release-namespace: monitoring
API Version:  monitoring.grafana.com/v1alpha1
Kind:         GrafanaAgent
Metadata:
  Creation Timestamp:  2023-11-30T20:44:42Z
  Generation:          2
  Resource Version:    3040719368
  UID:                 5bedee00-a8c7-4aba-a963-41be76adaebb
Spec:
  Disable Reporting:       false
  Disable Support Bundle:  false
  Enable Config Read API:  false
  Logs:
    Instance Selector:
      Match Labels:
        app.kubernetes.io/instance:  loki
        app.kubernetes.io/name:      loki
  Resources:
    Limits:
      Cpu:     50m
      Memory:  64M
    Requests:
      Cpu:               10m
      Memory:            32M
  Service Account Name:  loki-grafana-agent
Events:                  <none>

repjak avatar Dec 01 '23 12:12 repjak

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!

github-actions[bot] avatar Jan 01 '24 00:01 github-actions[bot]

Just to clarify, this only applies to the Operator in static mode; the Agent's own helm chart allows defining resources on the configReloader pod normally.

tpaschalis avatar Jan 09 '24 15:01 tpaschalis

This bothers me too, it would be great to be able to define resources for config-reloader too.

elcomtik avatar Feb 01 '24 09:02 elcomtik

As a workaround, one can create a LimitRange with default and defaultRequest limits for the namespace, but it requires permissions.

repjak avatar Feb 01 '24 10:02 repjak

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!

github-actions[bot] avatar Jun 21 '25 00:06 github-actions[bot]