helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[loki-distributed] Pods fail with /data read-only file system

Open HighWatersDev opened this issue 2 years ago • 5 comments

I'm deploying Grafana Loki distributed chart and it's failing with this error:

msg="error running loki" err="mkdir /data: read-only file system
error creating index client

These errors are happening in querier, ingester, and tableManager pods. I've seen similar issues posted here with /rules, /wal, /var/loki dirs but the solutions there (creating emptyDir with extraVolumes) didn't solve it for me.

Here's my values file:

compactor:
  enabled: true
  serviceAccount:
    create: true
loki:
  structuredConfig:
    compactor:
      shared_store: azure
    ingester:
      max_transfer_retries: 0
      chunk_idle_period: 1h
      chunk_target_size: 1536000
      max_chunk_age: 1h
    schema_config:
      configs:
      - from: "2020-12-11"
        index:
          period: 24h
          prefix: index_
        object_store: azure
        schema: v11
        store: boltdb-shipper
    storage_config:
      azure:
        account_key: REDACTED
        account_name: loki
        container_name: logs
        request_timeout: 0
        use_managed_identity: false
      boltdb_shipper:
        active_index_directory: /data/loki/boltdb-shipper-active
        cache_location: /data/loki/boltdb-shipper-cache
        cache_ttl: 24h
        shared_store: azure
      filesystem:
        directory: /data/loki/chunks
serviceMonitor:
  enabled: true
prometheusRule:
  enabled: true
  groups:
    - name: loki-rules
      rules:
        - record: job:loki_request_duration_seconds_bucket:sum_rate
          expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job)
        - record: job_route:loki_request_duration_seconds_bucket:sum_rate
          expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job, route)
        - record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
          expr: sum(rate(container_cpu_usage_seconds_total[1m])) by (node, namespace, pod, container)

Chart version: 0.49.0 Loki version: 2.5.0

HighWatersDev avatar Jul 01 '22 20:07 HighWatersDev

It seems that the directory in the storage_config cannot be created. Prepending the config with the default directory seems to solve the problem:

storage_config:
      azure:
        account_key: REDACTED
        account_name: loki
        container_name: logs
        request_timeout: 0
        use_managed_identity: false
      boltdb_shipper:
        active_index_directory: /var/loki/data/loki/boltdb-shipper-active
        cache_location: /var/loki/data/loki/boltdb-shipper-cache
        cache_ttl: 24h
        shared_store: azure
      filesystem:
        directory: /var/loki/data/loki/chunks

Is it possible to change this behavior or at least document it?

HighWatersDev avatar Jul 05 '22 14:07 HighWatersDev

I'm not entirely following. Is the request to document which paths are / aren't writable in the loki container? Or did we provide a bad example/documentation somewhere suggesting the use for /data/loki for this directory?

trevorwhitney avatar Jul 05 '22 16:07 trevorwhitney

Sorry for the confusion. Here's the doc I referenced suggesting to use /data/loki directory.

HighWatersDev avatar Jul 05 '22 16:07 HighWatersDev

Ahh, I see. Sorry for the confusion. I don't think that doc is specifically geared towards use with the helm chart, so that might be part of the issue?

trevorwhitney avatar Jul 05 '22 16:07 trevorwhitney

Fair statement. It was kind of difficult and confusing to figure out how to make Azure storage work with Helm chart deployment (thanks to Grafana Slack Loki channel). I wonder what the best approach would be:

  • should documentation be added to point out the need for adjustment inside Helm chart
  • or should the chart work with the default config found in the doc

HighWatersDev avatar Jul 05 '22 17:07 HighWatersDev

I had the same issue when deploying loki from the helm chart for azure.

If this could help anyone, this is what my values.yaml looked like.

loki:
  schema_config:
    configs:
      - from: 2022-01-11
        store: boltdb
        object_store: azure
        schema: v12
  storage_config:
    azure:
      account_name: ${AZ_STORAGEACCOUNT_NAME}
      account_key: ${AZ_STORAGEACCOUNT_KEY}
      container_name: logs
      request_timeout: 0
    boltdb_shipper:
      active_index_directory: /var/loki/data/loki/boltdb-shipper-active
      cache_location: /var/loki/data/loki/boltdb-shipper-cache
      cache_ttl: 24h
      shared_store: azure
    filesystem:
      directory: /var/loki/data/loki/chunks
read:
  extraArgs:
    - -config.expand-env=true
  extraEnvFrom:
    - secretRef:
        name: loki-secret
write:
  extraArgs:
    - -config.expand-env=true
  extraEnvFrom:
    - secretRef:
        name: loki-secret
backend:
  extraArgs:
    - -config.expand-env=true
  extraEnvFrom:
    - secretRef:
        name: loki-secret```

tcharetteacerta avatar Jan 25 '23 16:01 tcharetteacerta

@HighWatersDev Try latest version helm chart

patsevanton avatar Jan 30 '23 05:01 patsevanton