loki icon indicating copy to clipboard operation
loki copied to clipboard

[helm/loki-simple-scalable] Are PVCs really required? / Retention not working?

Open R-Studio opened this issue 3 years ago • 13 comments
trafficstars

My Goal

  • Run Loki only with Object Storage from MinIO, if it is possible.
  • I want to run Loki scalable and easy as possible, that's why I decided to use loki-simple-scalable resp. the new location .

My Question

  • Is it possible to run loki-simple-scalable without PVCs (only S3 from MinIO)?
    • If yes, how does the configuration look like?
    • If no, why are PVCs required?, Which contents are saved in my PVCs?, Why are my PVCs filling up?
  • Are the retentions configured in the compactor and limits_config working with S3 or do I need to configure the retention on my MinIO Bucket? As I understand the release notes of Loki 2.3 there is no need for external (not loki) retentions (Loki now has the ability to apply custom retention based on stream selectors! This will allow much finer control over log retention all of which is now handled by Loki, no longer requiring the use of object store configs for retention.) -> If yes, how does the configuration look like?
  • Just for my interest: What's the difference between configure retention in the compactor component and configure retention in table manager? In the documentation the following written: Retention through the Table Manager is achieved by relying on the object store TTL feature, and will work for both boltdb-shipper store and chunk/index store. However retention through the Compactor is supported only with the boltdb-shipper store. But in the section of the Table Manager it contradicts itself: When using S3 or GCS, the bucket storing the chunks needs to have the expiry policy set correctly. For more details check S3’s documentation or GCS’s documentation.

My current configuration

gateway:
  replicas: 2
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 3
    targetCPUUtilizationPercentage: 60
    targetMemoryUtilizationPercentage: 80
  ingress:
    enabled: true
  resources:
    limits:
      #cpu: 50m
      memory: 30Mi
    requests:
      cpu: 5m
      memory: 15Mi

loki:
  # The value below "config" must be a string instead of a map
  config: |
    auth_enabled: false
    common:
      path_prefix: /var/loki
      storage:
        filesystem: null
        s3:
          insecure: true
          s3: http://USERNAME:[email protected]:9000/loki
          s3forcepathstyle: true
    storage_config:
      boltdb_shipper:
        active_index_directory: /var/loki/index
        cache_location: /var/loki/cache
        cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
        shared_store: s3
    schema_config:
      configs:
        - from: "2020-09-07"
          store: boltdb-shipper
          object_store: s3
          schema: v12
          index:
            period: 24h
            prefix: index_
    limits_config:
      retention_period: 1h
      enforce_metric_name: false
      max_cache_freshness_per_query: 10m
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      split_queries_by_interval: 15m
    compactor:
      compaction_interval: 10m
      retention_enabled: true
      retention_delete_delay: 2h
      retention_delete_worker_count: 150
    table_manager:
      retention_deletes_enabled: true
      retention_period: 1h
    chunk_store_config:
      max_look_back_period: 1h
    query_scheduler:
      max_outstanding_requests_per_tenant: 1000
    memberlist:
      join_members:
      - loki-memberlist
    server:
      http_listen_port: 3100
      grpc_listen_port: 9095

write:
  replicas: 2
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 3
    targetCPUUtilizationPercentage: 60
    targetMemoryUtilizationPercentage: 80
  persistence:
    size: 5Gi
    storageClass: longhorn
  resources:
    limits:
      #cpu: 400m
      memory: 1200Mi # Don't decrease this limit. Loki uses enough memory to prevent dataloss (when storage backend unavailable)
    requests:
      cpu: 35m
      memory: 300Mi # Don't decrease this limit. Loki can fastly grow up in memory and this can cause OOM-Killing of other pods

read:
  replicas: 2
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 3
    targetCPUUtilizationPercentage: 60
    targetMemoryUtilizationPercentage: 80
  persistence:
    size: 1Gi
    storageClass: longhorn
  resources:
    limits:
      #cpu: 400m
      memory: 150Mi
    requests:
      cpu: 50m
      memory: 75Mi
    
serviceMonitor:
  enabled: true


monitoring:
  selfMonitoring:
    enabled: false
    grafanaAgent:
      installOperator: false
  • Environment specific Helm values:
gateway:
  ingress:
    hosts:
    - host: loki.apps-test-1....
      paths: 
      - path: /
        pathType: ImplementationSpecific
    tls:
    - hosts: 
      - loki.apps-test-1....
      secretName: ingress-wildcard-cert

-> I already opened this issue in the previous location of the Loki-Simple-Scalable Helm Chart, but because the Helm chart moved (thanks to @zanhsieh), I opened a new issue here.

R-Studio avatar Sep 06 '22 11:09 R-Studio

There are other users with the same issue.

R-Studio avatar Sep 06 '22 11:09 R-Studio

This problem has been bothering me

walingcc avatar Sep 08 '22 01:09 walingcc

Im having the same issue...but after reading the docs over and over again i think the retention configured here for compactor is only for the index (boltdb_shipper store). So no logs older than retention are displayed in grafana. The chunks arent deleted by compactor.

out of https://grafana.com/docs/loki/latest/operations/storage/retention/ "When using S3 or GCS, the bucket storing the chunks needs to have the expiry policy set correctly. For more details check S3’s documentation or GCS’s documentation."

Although @R-Studio you are right, the 2.3 Patchnotes sound bit different.

atze234 avatar Sep 09 '22 09:09 atze234

@atze234 thanks for your reply. It is little bit confusing because in the docs for compactor following is written: The Compactor can deduplicate index entries. It can also apply granular retention. When applying retention with the Compactor, the Table Manager is unnecessary.
-> It would be helpful when the docs also adds following sentence for compactor: When using S3 or GCS, the bucket storing the chunks needs to have the expiry policy set correctly.

Anyway, do you know how to configure loki in "scalable mode" without having persistent volumes? (so that loki only uses S3)?

R-Studio avatar Sep 12 '22 08:09 R-Studio

@atze234 @R-Studio The compactor docs also say: "The chunks will be deleted by the compactor asynchronously when swept." and "Marked chunks will only be deleted after retention_delete_delay configured is expired" (plus an explanation of why that is). So I'd say no expiry policy on the object store is necessary with the current Loki version. (Purely based on reading the docs, I hope to have time soon to try it out)

davdr avatar Sep 17 '22 08:09 davdr

@R-Studio With the current chart its not possible to configure the PVC away. You may use an ephemeral Storage Class maybe. Although i think there will be data loss in case of restarts, so i wont do that.

@davdr I tried with just the compactor and no chunks were deleted out of S3. I tried with 7days retention and even after 4 weeks there were old chunks in s3. Thats why i came here :)

atze234 avatar Sep 17 '22 09:09 atze234

@R-Studio The table_manager docs say: "The retention period must be a multiple of the index / chunks", the compactor docs say The minimum retention period is 24h., and the schema_config docs mentions that the default period of index and chunks is 168h. Can you check your configuration and try again?

guoew avatar Sep 18 '22 06:09 guoew

@atze234 Thanks for your reply, but why is it possible with the "Single Binary" mode?

@guoew If I have understood you correctly, then the following configuration should work with the retention of 4 days:

loki:
  # The value below "config" must be a string instead of a map
  config: |
    auth_enabled: false
    common:
      path_prefix: /var/loki
      replication_factor: 3
      storage:
        filesystem: null
        s3:
          insecure: true
          s3: http://<USER:PASSWORD>@minio.minio.svc.cluster.local:9000/loki
          s3forcepathstyle: true
    storage_config:
      boltdb_shipper:
        active_index_directory: /var/loki/index
        cache_location: /var/loki/cache
        cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
        shared_store: s3
    schema_config:
      configs:
        - from: "2020-09-07"
          store: boltdb-shipper
          object_store: s3
          schema: v12
          index:
            period: 24h
            prefix: index_
    limits_config:
      retention_period: 4d
      enforce_metric_name: false
      max_cache_freshness_per_query: 10m
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      split_queries_by_interval: 15m
    compactor:
      compaction_interval: 10m
      retention_enabled: true
      retention_delete_delay: 2h
      retention_delete_worker_count: 150
    table_manager:
      retention_deletes_enabled: true
      retention_period: 4d
    query_scheduler:
      max_outstanding_requests_per_tenant: 1000
    memberlist:
      join_members:
      - loki-memberlist
    server:
      http_listen_port: 3100
      grpc_listen_port: 9095
      grpc_server_max_recv_msg_size: 104857600
      grpc_server_max_send_msg_size: 104857600
    ingester_client:
      grpc_client_config:
        max_recv_msg_size: 104857600
    ruler:
      alertmanager_url: http://alertmanager-operated.cattle-monitoring-system.svc.cluster.local:9093
      enable_alertmanager_v2: true
      enable_api: true
      enable_sharding: true

-> Until now the retention is not working, but maybe I have to wait. (I will get back to you later)

R-Studio avatar Sep 19 '22 07:09 R-Studio

@guoew It looks like the retention is working now: Before changing the retention to 4 days: image

After changing the retention to 4 days: image

But on the S3 (MinIO) I don't see that loki has deleted anything. But maybe this takes some time for MinIO to really delete the files. Before changing the retention to 4 days: image

After changing the retention to 4 days: image

R-Studio avatar Sep 19 '22 08:09 R-Studio

@R-Studio You can change schema_config to the following configuration and try again.

    schema_config
      configs:
        - from: "2020-09-07"
          store: boltdb-shipper
          object_store: s3
          schema: v12
          index:
            period: 24h
            prefix: index_
          chunks:
            period: 24h

guoew avatar Sep 19 '22 08:09 guoew

@guoew what does this mean/change? And does it work if I set it only now?

I changed it, but no difference: image

R-Studio avatar Sep 19 '22 09:09 R-Studio

Now it works, thanks! 👍😊😉 image

If I have understood the loki retention correctly, you have to set the following configuration to make the retention working, correct? (Example of a retention of 7 days (only the retention-relevant configuration):

schema_config:
  configs:
    - index:
        period: 24h
      chunks:
        period: 24h
limits_config:
  retention_period: 7d
compactor:
  retention_enabled: true
table_manager:
  retention_deletes_enabled: true
  retention_period: 7d

-> @atze234 maybe this helps also for you.

R-Studio avatar Sep 19 '22 09:09 R-Studio

For me its not deleting anything out of chunks :-1: aws-cli easily finds files older than 7 days. My config:

    auth_enabled: true
    common:
      path_prefix: /var/loki
      replication_factor: 3
      storage:
        s3:
          bucketnames: <redacted>
          endpoint: s3.eu-central-1.amazonaws.com
          insecure: false
          region: eu-central-1
          s3: <redacted>
          s3forcepathstyle: false
    compactor:
      compaction_interval: 10m
      retention_delete_delay: 2h
      retention_delete_worker_count: 150
      retention_enabled: true
      shared_store: s3
      working_directory: /var/loki/compactor
    limits_config:
      enforce_metric_name: false
      max_cache_freshness_per_query: 10m
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      retention_period: 7d
      split_queries_by_interval: 15m
    memberlist:
      join_members:
      - loki-memberlist
    query_range:
      align_queries_with_step: true
    query_scheduler:
      max_outstanding_requests_per_tenant: 2048
    ruler:
      storage:
        s3:
          bucketnames: <redacted>
          endpoint: s3.eu-central-1.amazonaws.com
          insecure: false
          region: eu-central-1
          s3: <redacted>
          s3forcepathstyle: false
    schema_config:
      configs:
      - chunks:
          period: 24h
        from: "2022-01-11"
        index:
          period: 24h
          prefix: loki_index_
        object_store: s3
        schema: v12
        store: boltdb-shipper
    server:
      grpc_listen_port: 9095
      http_listen_port: 3100
    storage_config:
      aws:
        s3: s3://eu-central-1/<redacted>
      boltdb_shipper:
        active_index_directory: /var/loki/index
        cache_location: /var/loki/boltdb-cache
        resync_interval: 5s
        shared_store: s3
      hedging:
        at: 250ms
        max_per_second: 20
        up_to: 3
    table_manager:
      retention_deletes_enabled: true
      retention_period: 7d

atze234 avatar Sep 20 '22 13:09 atze234

loki-mini0 Increase every day

walingcc avatar Sep 26 '22 07:09 walingcc

Given the conversation in #7210 is it ok to close this issue?

trevorwhitney avatar Oct 25 '22 22:10 trevorwhitney

@trevorwhitney

Hi still there is no clear answer what loki supports and what it can do. So I have same issue, S3 integration with loki. Everything works well except retention, I set it for 168h, using compactor but nothing happen, logs are there forever.

  1. Does loki support automatic delete of chunks from S3 like API (in my case ceph object storage)?
  2. Does simple scalable chart has compactor included or we need to add it manually somehow? How?
  3. If loki doesn't support retention on S3, then I believe I need to set retention policies in bucket?
  4. How can I find compactor logs? I examined all pods of simple-scalable deployment and I can't find any pod or container in existing pods for compactor. Is it a part of some other container/pod or it is just missing?

Looking at the documentation it is confusing as mentioned above. https://grafana.com/docs/loki/latest/storage/ part of the configuration looks like an answer, that it is not supported for S3, but there is a information "For more information, see the retention configuration documentation."

When I go there I will find nothing more about S3 case.

...and I believe when this part "https://grafana.com/docs/loki/latest/operations/storage/retention/" says "The chunks will be deleted by the compactor asynchronously when swept. I believe it means they will be deleted only from boltdb-shipper, but not from S3. Mu suggestion will be to add information in the "https://grafana.com/docs/loki/latest/operations/storage/retention/", that clearly says deletion of chunks from S3 (maybe others also) is not supported and compactor/table-manager only remove chunks from cache/index (boltdb-shipper or simillar).

bbroniewski avatar Nov 25 '22 07:11 bbroniewski

@bbroniewski the compactor is in the read component of simple scalable deployment. The compators logs will be part of the read pod. If you just want to make sure it's running, you could look for caller=compactor.go. You should set retention policies in your bucket (ie TTL), and the compactor will handle removing old chunks from the index so they no longer return in queries.

I agree the docs could probably use a bit of clarification regarding retention. Would you mind submitting a PR?

trevorwhitney avatar Dec 12 '22 19:12 trevorwhitney

@trevorwhitney Thank you for the answer! I can prepare enhancement of docs, but I will need your support to understand it perfectly. Is there any channel where I can ask you some question and get answer quicker during preparation of document modification?

bbroniewski avatar Dec 12 '22 19:12 bbroniewski

I am in the community Grafana slack, but I would encourage you open a PR (even if it's not perfect) and we can collaborate through the PR comment/review process.

trevorwhitney avatar Dec 13 '22 16:12 trevorwhitney

Can i somehow select which logs will be kept and which log files will have a retention policy or is it a "catch-all" thing?

SnoozeFreddo avatar Aug 02 '23 11:08 SnoozeFreddo

@SnoozeFreddo you can use retention per stream selector: https://grafana.com/docs/loki/latest/operations/storage/retention/#configuring-the-retention-period

R-Studio avatar Aug 03 '23 11:08 R-Studio

@atze234

try removing storage_config.aws configuration

stringang avatar Aug 05 '23 03:08 stringang

the compactor is in the read component of simple scalable deployment. The compators logs will be part of the read pod. If you just want to make sure it's running, you could look for caller=compactor.go.

As of grafana/loki:3.0.0 I was able to find caller=compactor.go only in component=backend of simple scalable deployment.

dyumin avatar Jun 28 '24 11:06 dyumin