loki icon indicating copy to clipboard operation
loki copied to clipboard

AWS region is set to `dummy`

Open exb-atix opened this issue 2 years ago • 13 comments

Describe the bug AWS region is not taken into account at least for loki-backend pods when trying to access AWS STS. This is throwing continuously error messages (see log in output section). Apart from these error messages, everything is working as expected though. After looking into the code (as non-native Go speaker) the culprit seems to lie around the lines 224-232 and 245-247 of s3_storage_client.go where the region should be set into the s3Config object.

To Reproduce Steps to reproduce the behavior:

  1. Deploy Loki (2.9.3) via Helm chart (5.41.0)
  2. look into log of a loki-backend pod

Expected behavior The used endpoint is not filled with dummy region and thus doesn't throw an error.

Environment:

  • Infrastructure: Kubernetes on AWS
  • Deployment tool: helm via helmfile

Screenshots, Promtail config, or terminal output

Loki log:

level=info ts=2023-12-12T10:44:11.495384847Z caller=loki.go:505 msg="Loki started"
level=error ts=2023-12-12T10:44:11.505356978Z caller=ruler.go:571 msg="unable to list rules" err="WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.dummy.amazonaws.com/\": dial tcp: lookup sts.dummy.amazonaws.com on 172.20.0.10:53: no such host"

Loki helm values:

loki:
  auth_enabled: false
  commonConfig:
    path_prefix: /var/loki
    replication_factor: 3
  compactor:
    apply_retention_interval: 1h
    compaction_interval: 5m
    retention_delete_worker_count: 500
    retention_enabled: true
    shared_store: s3
  schemaConfig:
    configs:
      - from: 2018-04-15
        store: boltdb-shipper
        object_store: s3
        schema: v11
        index:
          prefix: loki_index_
          period: 24h
  server:
    http_listen_port: 3100
  storage_config:
    boltdb_shipper:
      active_index_directory: /var/loki/index
      cache_location: /var/loki/index_cache
      shared_store: s3
    aws:
      bucketnames: {{ .Values.loki.bucket_name }}
      region: {{ .Values.aws.region }}
      s3forcepathstyle: false
serviceAccount:
  create: true
  name: loki
  annotations:
    eks.amazonaws.com/role-arn: {{ .Values.loki.s3_access_role }}

Screenshot of the applied env vars:

envvars

exb-atix avatar Dec 12 '23 12:12 exb-atix

im seeing this as well but on 2.9.1

tehlers320 avatar Dec 14 '23 19:12 tehlers320

Sorry im not using loki helm but i figured it out.

The config must have the region set here when using IRSA

    common:
      compactor_address: 'loki'
      path_prefix: /var/loki
      replication_factor: 2
      storage:
        s3:
          bucketnames: {{ .Values.s3_bucket }}
          region: {{ .Values.region }}

Mines working now. Env vars did not matter for whatever reason.

tehlers320 avatar Dec 14 '23 20:12 tehlers320

Sorry im not using loki helm but i figured it out.

The config must have the region set here when using IRSA

    common:
      compactor_address: 'loki'
      path_prefix: /var/loki
      replication_factor: 2
      storage:
        s3:
          bucketnames: {{ .Values.s3_bucket }}
          region: {{ .Values.region }}

Mines working now. Env vars did not matter for whatever reason.

Thank you for your response. I tried your suggestion and put the storage block into the commonConfig block in my config, but unfortunately the issue is still the same.

exb-atix avatar Dec 19 '23 14:12 exb-atix

I have the same issue

AntonioDiTuri avatar Jan 04 '24 10:01 AntonioDiTuri

I'm facing the same issue any update on this?

abhivaidya07 avatar Jan 04 '24 11:01 abhivaidya07

I had the same error message, deploying via helm and loki as singleBinary. After adding the list-element "ruler: BUCKET_NAME" it disapeared

# values.yaml
loki:
  ..
  storage:
    bucketNames:
      chunks: BUCKET_NAME
      ruler: BUCKET_NAME
    type: s3
    s3:
      s3: s3://BUCKET_NAME
      region: "eu-central-1"
      accessKeyId: "${GRAFANA_LOKI_S3_ACCESKEYID}"
      secretAccessKey: "${GRAFANA_LOKI_S3_SECRETACCESSKEY}"
      s3ForcePathStyle: false
      insecure: false

https://grafana.com/docs/loki/latest/setup/install/helm/install-monolithic/

Which makes sense, since the helm chart's _helpers.tpl is looking for $.Values.loki.storage.bucketNames.ruler

https://github.com/grafana/loki/blob/main/production/helm/loki/templates/_helpers.tpl#L342

0xdnL avatar Feb 07 '24 14:02 0xdnL

hello 0xdnL, thank you for this hint. I was now able to test your suggestion, but unfortunately the error persists. This is my change that i tried (among several variations):

commonConfig:
    path_prefix: /var/loki
    replication_factor: 3
    storage:
      bucketNames:
        ruler: {{ .Values.loki.bucket_name }}
        chunks: {{ .Values.loki.bucket_name }}
      type: s3
      s3:
        s3: {{ .Values.loki.bucket_name }}
        region: {{ .Values.aws.region }}
        s3forcepathstyle: false

exb-atix avatar Feb 20 '24 13:02 exb-atix

Can someone from grafana add a definitive working exemple values.yaml in exemple directory for distributed loki with S3 backend using IRSA ?

gitarns avatar Mar 17 '24 17:03 gitarns

small update: we upgraded to loki 3.0.0 via helm chart version 6.3.3 and the error still persists.

exb-atix avatar Apr 29 '24 07:04 exb-atix

Can confirm this is still the case as of Chart v6.6.4:

init compactor: failed to init delete store: failed to get s3 object: WebIdentityErr: failed to retrieve credentials
caused by: RequestError: send request failed
caused by: Post "https://sts.dummy.amazonaws.com/": 3 errors occurred:
	* dial tcp: lookup sts.dummy.amazonaws.com on 172.20.0.10:53: no such host
	* dial tcp: lookup sts.dummy.amazonaws.com on 172.20.0.10:53: no such host
	* dial tcp: lookup sts.dummy.amazonaws.com on 172.20.0.10:53: no such host

Loucool111 avatar Aug 08 '24 09:08 Loucool111