loki icon indicating copy to clipboard operation
loki copied to clipboard

[Helm] Helm test requires self monitoring to be enabled

Open jasperjonker opened this issue 2 years ago • 4 comments

Describe the bug I cannot create a helm template with loki with version > 3.2.2. As this is the way ArgoCD deploys applications, I cannot deply Loki with Chart version > 3.2.2 using Helm E.g.:

Chart.yaml

apiVersion: v2
name: loki
version: 3.3.2
dependencies:
  - name: loki
    version: 3.3.2
    repository: https://grafana.github.io/helm-charts

values.yaml

loki:
  loki:
    auth_enabled: false

    schemaConfig:
      configs:
      - from: 2020-10-24
        store: boltdb-shipper
        object_store: gcs
        schema: v12
        index:
          prefix: index_
          period: 24h

    storage_config:
      boltdb_shipper:
        active_index_directory: /var/loki/index
        cache_location: /var/loki/boltdb-cache
        cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
        shared_store: gcs
      gcs:
        bucket_name: loki

    storage:
      bucketNames:
        chunks: loki_chunks
        ruler: loki_ruler
        admin: loki_admin
      type: gcs

    memcached:
      chunk_cache:
        enabled: true
        host: "memcached-loki.loki"
        service: memcache
        batch_size: 1024
        parallelism: 100
      results_cache:
        enabled: true
        host: "memcached-loki.loki"
        service: memcache
        timeout: "500ms"
        default_validity: "12h"

    rulerConfig:
      storage:
        type: local
        local:
          directory: "/tmp/rules"
      rule_path: /tmp/scratch
      alertmanager_url: http://prometheus-infra-alertmanager.prometheus:80
      ring:
        kvstore:
          store: inmemory
      enable_api: true
      enable_alertmanager_v2: true

    # ---------------------
    # This section below is added because loki sometimes throws an error "too many outstanding requests", see https://github.com/grafana/loki/issues/4613
    # This should solve that
    query_scheduler:
      max_outstanding_requests_per_tenant: 2048

    limits_config:
      max_query_series: 5000

  rules:
    additionalGroups:
    - name: additional-loki-rules
      rules:
        - record: job:loki_request_duration_seconds_bucket:sum_rate
          expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job)
        - record: job_route:loki_request_duration_seconds_bucket:sum_rate
          expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job, route)
        - record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
          expr: sum(rate(container_cpu_usage_seconds_total[1m])) by (node, namespace, pod, container)

  selfMonitoring:
    enabled: false

  ingress:
    # We use the Gateway
    enabled: false

  read:
    autoscaling:
      enabled: true
      minReplicas: 2
      maxReplicas: 5

    persistence:
      storageClass: premium-rwo

  write:
    nodeSelector:
      iam.gke.io/gke-metadata-server-enabled: "true"

    persistence:
      storageClass: premium-rwo

  monitoring:
    selfMonitoring:
      enabled: false

  gateway:
    enabled: true
    autoscaling:
      enabled: true
      maxReplicas: 5
    ingress:
      enabled: true
      hosts:
        - host: "loki.xxx.com"
          paths:
            - path: /
              pathType: ImplementationSpecific
      tls:
        - hosts:
            - loki.xxx.com
          secretName: tls-loki
      ingressClassName: nginx

To Reproduce Steps to reproduce the behavior:

  1. Place the Chart.yaml and values.yaml in a folder.
  2. Run helm dependency build && helm template --debug . -f values.yaml > all.yaml && rm -rf Chart.lock charts
  3. If version in Chart.yaml is > 3.2.2 it will fail with:
Update Complete. ⎈Happy Helming!⎈
Saving 1 charts
Downloading loki from repo https://grafana.github.io/helm-charts
Deleting outdated charts
install.go:173: [debug] Original chart version: ""
install.go:190: [debug] CHART PATH: /home/xxx//loki

Error: template: loki/charts/loki/templates/validate.yaml:12:4: executing "loki/charts/loki/templates/validate.yaml" at <fail "Helm test requires self monitoring to be enabled">: error calling fail: Helm test requires self monitoring to be enabled
helm.go:81: [debug] template: loki/charts/loki/templates/validate.yaml:12:4: executing "loki/charts/loki/templates/validate.yaml" at <fail "Helm test requires self monitoring to be enabled">: error calling fail: Helm test requires self monitoring to be enabled

Expected behavior When the version is 3.2.2 or below, it creates a file called all.yaml with the whole manifest of loki. This can be deployed using kubectl apply -f all.yaml

Environment:

  • Infrastructure: kubernetes
  • Deployment tool: helm

jasperjonker avatar Nov 08 '22 09:11 jasperjonker

I'm not sure what helm test does (or where to read about it), but if you are disabling selfMonitoring, maybe you should also disable tests?

test:
  enabled: false

AurimasNav avatar Nov 10 '22 08:11 AurimasNav

We're hitting this as well - the 'solution' is to disabled 'test' as @AurimasNav says, but it feels a bit wrong.

If the test relies on:

selfMonitoring:
  enabled: true

Then shouldn't that value being set to false also diable that specific test?

slushysnowman avatar Nov 14 '22 15:11 slushysnowman

I'm not sure what helm test does (or where to read about it), but if you are disabling selfMonitoring, maybe you should also disable tests?

test:
  enabled: false

Disabling validation checks should not be the solution there. The Loki chart providers would need to make the self monitoring more configurable...

I mean why is the chart delivering Prometheus CRDs... srsly

rufreakde avatar Nov 16 '23 15:11 rufreakde

Any update here?

dlahn avatar Feb 22 '24 21:02 dlahn

I ran into this same issue with Loki Helm chart 5.5.2 (Loki version 2.8.2).

The CRD's from Loki helm chart are conflicting with the CRD's installed by kube-prometheus-stack, causing a race condition if they're both applied at the same time.

I've disabled the CRD's from Loki by setting monitoring.selfmonitoring.grafanaAgent.installOperator: false but with selfMonitoring.enabled: true (default) it fails to apply the chart because these CRD's are required:

monitoring.grafana.com/v1alpha1/PodLogs
monitoring.grafana.com/v1alpha1/GrafanaAgent
monitoring.grafana.com/v1alpha1/LogsInstance

Since Prometheus can monitor Loki, I figured it is safe to set selfMonitoring.enabled: false, but now I receive the error that others have mentioned (loki/templates/validate.yaml:6:4): Helm test requires self monitoring to be enabled. I get this error when using the most recent Loki chart version, 5.47.2

Edit: It looks like the only helm test implemented is based on the Loki canary which is part of the self-monitoring: https://github.com/grafana/loki/blob/main/production/helm/loki/templates/tests/test-canary.yaml

slyt avatar Apr 03 '24 17:04 slyt