loki icon indicating copy to clipboard operation
loki copied to clipboard

Compactor retention with tsdb_shipper does not work

Open pyo-counting opened this issue 2 years ago • 4 comments

Describe the bug compactor retention does not work with tsdb shipper, AWS s3 object storage.

To Reproduce Steps to reproduce the behavior:

  1. Install Helm chart with custom value file(ssd-values.yaml)
    helm install loki grafana/loki --version 5.41.8 --namespace loki-ns -f ssd-values.yaml
    
    # ssd-values.yaml
    loki:
      # -- The number of old ReplicaSets to retain to allow rollback
      revisionHistoryLimit: 2
      # -- Config file contents for Loki
      # @default -- See values.yaml
      config: |
        {{- if .Values.enterprise.enabled}}
        {{- tpl .Values.enterprise.config . }}
        {{- else }}
        auth_enabled: {{ .Values.loki.auth_enabled }}
        {{- end }}
    
        {{- with .Values.loki.server }}
        server:
          {{- toYaml . | nindent 2}}
        {{- end}}
    
        memberlist:
        {{- if .Values.loki.memberlistConfig }}
          {{- toYaml .Values.loki.memberlistConfig | nindent 2 }}
        {{- else }}
        {{- if .Values.loki.extraMemberlistConfig}}
        {{- toYaml .Values.loki.extraMemberlistConfig | nindent 2}}
        {{- end }}
          join_members:
            - {{ include "loki.memberlist" . }}
            {{- with .Values.migrate.fromDistributed }}
            {{- if .enabled }}
            - {{ .memberlistService }}
            {{- end }}
            {{- end }}
        {{- end }}
    
        {{- with .Values.loki.ingester }}
        ingester:
          {{- tpl (. | toYaml) $ | nindent 4 }}
        {{- end }}
    
        {{- if .Values.loki.commonConfig}}
        common:
        {{- toYaml .Values.loki.commonConfig | nindent 2}}
          storage:
          {{- include "loki.commonStorageConfig" . | nindent 4}}
        {{- end}}
    
        {{- with .Values.loki.limits_config }}
        limits_config:
          {{- tpl (. | toYaml) $ | nindent 4 }}
        {{- end }}
    
        runtime_config:
          file: /etc/loki/runtime-config/runtime-config.yaml
    
        {{- with .Values.loki.memcached.chunk_cache }}
        {{- if and .enabled (or .host .addresses) }}
        chunk_store_config:
          chunk_cache_config:
            memcached:
              batch_size: {{ .batch_size }}
              parallelism: {{ .parallelism }}
            memcached_client:
              {{- if .host }}
              host: {{ .host }}
              {{- end }}
              {{- if .addresses }}
              addresses: {{ .addresses }}
              {{- end }}
              service: {{ .service }}
        {{- end }}
        {{- end }}
    
        {{- if .Values.loki.schemaConfig }}
        schema_config:
        {{- toYaml .Values.loki.schemaConfig | nindent 2}}
        {{- else }}
        schema_config:
          configs:
            - from: 2022-01-11
              store: boltdb-shipper
              object_store: {{ .Values.loki.storage.type }}
              schema: v12
              index:
                prefix: loki_index_
                period: 24h
        {{- end }}
    
        {{ include "loki.rulerConfig" . }}
    
        {{- if or .Values.tableManager.retention_deletes_enabled .Values.tableManager.retention_period }}
        table_manager:
          retention_deletes_enabled: {{ .Values.tableManager.retention_deletes_enabled }}
          retention_period: {{ .Values.tableManager.retention_period }}
        {{- end }}
    
        {{- with .Values.loki.memcached.results_cache }}
        query_range:
          align_queries_with_step: true
          {{- if and .enabled (or .host .addresses) }}
          cache_results: {{ .enabled }}
          results_cache:
            cache:
              default_validity: {{ .default_validity }}
              memcached_client:
                {{- if .host }}
                host: {{ .host }}
                {{- end }}
                {{- if .addresses }}
                addresses: {{ .addresses }}
                {{- end }}
                service: {{ .service }}
                timeout: {{ .timeout }}
          {{- end }}
        {{- end }}
    
        {{- with .Values.loki.storage_config }}
        storage_config:
          {{- tpl (. | toYaml) $ | nindent 4 }}
        {{- end }}
    
        {{- with .Values.loki.query_scheduler }}
        query_scheduler:
          {{- tpl (. | toYaml) $ | nindent 4 }}
        {{- end }}
    
        {{- with .Values.loki.compactor }}
        compactor:
          {{- tpl (. | toYaml) $ | nindent 4 }}
        {{- end }}
    
        {{- with .Values.loki.analytics }}
        analytics:
          {{- tpl (. | toYaml) $ | nindent 4 }}
        {{- end }}
    
        {{- with .Values.loki.querier }}
        querier:
          {{- tpl (. | toYaml) $ | nindent 4 }}
        {{- end }}
    
        {{- with .Values.loki.index_gateway }}
        index_gateway:
          {{- tpl (. | toYaml) $ | nindent 4 }}
        {{- end }}
    
        {{- with .Values.loki.frontend }}
        frontend:
          {{- tpl (. | toYaml) $ | nindent 4 }}
        {{- end }}
    
        {{- with .Values.loki.frontend_worker }}
        frontend_worker:
          {{- tpl (. | toYaml) $ | nindent 4 }}
        {{- end }}
    
        {{- with .Values.loki.distributor }}
        distributor:
          {{- tpl (. | toYaml) $ | nindent 4 }}
        {{- end }}
    
        tracing:
          enabled: {{ .Values.loki.tracing.enabled }}
      # Should authentication be enabled
      auth_enabled: true
      # -- Check https://grafana.com/docs/loki/latest/configuration/#server for more info on the server configuration.
      server:
        log_format: "logfmt"
        log_level: "info"
        log_source_ips_enabled: true
        log_request_headers: true
        log_request_at_info_level_enabled: true
      # -- Limits config
      limits_config:
        max_line_size: 10KB
        per_stream_rate_limit: 5MB
        per_stream_rate_limit_burst: 20MB
        split_queries_by_interval: 15m
        retention_period: 7d
        retention_stream:
          - selector: '{environment="dev"}'
            priority: 1
            period: 1d
          - selector: '{environment="stg"}'
            priority: 1
            period: 2d
        shard_streams:
          enabled: false
        allow_structured_metadata: true
      # -- Provides a reloadable runtime configuration file for some specific configuration
      runtimeConfig: {}
      # -- Check https://grafana.com/docs/loki/latest/configuration/#common_config for more info on how to provide a common     configuration
      commonConfig:
        path_prefix: /var/loki
        replication_factor: 3
        ring:
          kvstore:
            store: "memberlist"
        compactor_address: '{{ include "loki.compactorAddress" . }}'
      # -- Storage config. Providing this will automatically populate all necessary storage configs in the templated config.
      storage:
        bucketNames:
          chunks: kps-shr-tools-s3-loki
          ruler: kps-shr-tools-s3-loki
        type: s3
        s3:
          region: ap-northeast-2
      # -- Configure memcached as an external cache for chunk and results cache. Disabled by default
      # must enable and specify a host for each cache you would like to use.
      memcached:
        chunk_cache:
          enabled: false
        results_cache:
          enabled: false
      # -- Check https://grafana.com/docs/loki/latest/configuration/#schema_config for more info on how to configure schemas
      schemaConfig:
        configs:
          - from: "2024-01-01"
            store: tsdb
            object_store: s3
            schema: v13
            index:
              prefix: tsdb_index_
              period: 24h
      # -- Check https://grafana.com/docs/loki/latest/configuration/#ruler for more info on configuring ruler
      rulerConfig: {}
      # -- Structured loki configuration, takes precedence over `loki.config`, `loki.schemaConfig`, `loki.storageConfig`
      structuredConfig:
        common:
          storage:
            s3:
              storage_class: "STANDARD"
            hedging:
              at: 250ms
              up_to: 3
              max_per_second: 20
        query_range:
          results_cache:
            cache:
              enable_fifocache: false
              embedded_cache:
                enabled: true
                max_size_mb: 150
                ttl: 30m
            compression: "snappy"
          cache_results: true
          cache_index_stats_results: false
      # -- Additional query scheduler config
      query_scheduler:
        max_outstanding_requests_per_tenant: 32768
        querier_forget_delay: 60s
      # -- Additional storage config
      storage_config:
        aws:
          bucketnames: kps-shr-tools-s3-loki
          region: ap-northeast-2
          insecure: false
          storage_class: "STANDARD"
        tsdb_shipper:
          active_index_directory: /var/loki/ingester/tsdb_shipper
          shared_store: "s3"
          shared_store_key_prefix: "tsdb_shipper/"
          cache_location: /var/loki/index_gateway/tsdb_shipper
          index_gateway_client:
            log_gateway_requests: true
      # --  Optional compactor configuration
      compactor:
        working_directory: "/var/loki/compactor"
        shared_store: "s3"
        shared_store_key_prefix: "compoactor/"
        retention_enabled: true
        compactor_ring:
          kvstore:
            store: "memberlist"
      # --  Optional analytics configuration
      analytics:
        reporting_enabled: false
      # --  Optional querier configuration
      querier:
        tail_max_duration: 30m
        max_concurrent: 16
        multi_tenant_queries_enabled: true
      # --  Optional ingester configuration
      ingester:
        lifecycler:
          ring:
            kvstore:
              store: "memberlist"
          final_sleep: 15s
        wal:
          dir: "/var/loki/ingester/wal"
          flush_on_shutdown: true
          replay_memory_ceiling: 1GB
      # --  Optional index gateway configuration
      index_gateway:
        mode: ring
        ring:
          kvstore:
            store: "memberlist"
      frontend:
        scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
        log_queries_longer_than: 5s
        query_stats_enabled: true
        scheduler_dns_lookup_period: 3s
        compress_responses: true
      frontend_worker:
        match_max_concurrent: true
        scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
      # -- Optional distributor configuration
      distributor:
        ring:
          kvstore:
            store: "memberlist"
        rate_store:
          debug: true
        write_failures_logging:
          add_insights_label: true
      # -- Enable tracing
      tracing:
        enabled: true
    enterprise:
      # Enable enterprise features, license must be provided
      enabled: false
    
    # -- Options that may be necessary when performing a migration from another helm chart
    migrate:
      # -- When migrating from a distributed chart like loki-distributed or enterprise-logs
      fromDistributed:
        # -- Set to true if migrating from a distributed helm chart
        enabled: false
    
    serviceAccount:
      # -- Specifies whether a ServiceAccount should be created
      create: true
      # -- The name of the ServiceAccount to use.
      # If not set and create is true, a name is generated using the fullname template
      name: loki-sa
      # -- Annotations for the service account
      annotations:
        eks.amazonaws.com/role-arn: (...skip...)
      # -- Set this toggle to false to opt out of automounting API credentials for the service account
      automountServiceAccountToken: true
    
    # RBAC configuration
    rbac:
      # -- If pspEnabled true, a PodSecurityPolicy is created for K8s that use psp.
      pspEnabled: false
      # -- For OpenShift set pspEnabled to 'false' and sccEnabled to 'true' to use the SecurityContextConstraints.
      sccEnabled: false
    
    # -- Section for configuring optional Helm test
    test:
      enabled: false
    
    # Monitoring section determines which monitoring features to enable
    monitoring:
      # Dashboards for monitoring Loki
      dashboards:
        # -- If enabled, create configmap with dashboards for monitoring Loki
        enabled: false
      # Recording rules for monitoring Loki, required for some dashboards
      rules:
        # -- If enabled, create PrometheusRule resource with Loki recording rules
        enabled: false
        # -- Include alerting rules
        alerting: false
      # ServiceMonitor configuration
      serviceMonitor:
        # -- If enabled, ServiceMonitor resources for Prometheus Operator are created
        enabled: false
      # Self monitoring determines whether Loki should scrape its own logs.
      # This feature currently relies on the Grafana Agent Operator being installed,
      # which is installed by default using the grafana-agent-operator sub-chart.
      # It will create custom resources for GrafanaAgent, LogsInstance, and PodLogs to configure
      # scrape configs to scrape its own logs with the labels expected by the included dashboards.
      selfMonitoring:
        enabled: false
      # The Loki canary pushes logs to and queries from this loki installation to test
      # that it's working correctly
      lokiCanary:
        enabled: false
    
    # Configuration for the write pod(s)
    write:
      # -- Number of replicas for the write
      replicas: 3
      autoscaling:
        # -- Enable autoscaling for the write.
        enabled: false
      # -- Comma-separated list of Loki modules to load for the write
      targetModule: "write"
      # -- Resource requests and limits for the write
      resources:
        limits:
          cpu: 1.5
          memory: 2Gi
        requests:
          cpu: 500m
          memory: 500Mi
      # -- Grace period to allow the write to shutdown before it is killed. Especially for the ingester,
      # this must be increased. It must be long enough so writes can be gracefully shutdown flushing/transferring
      # all data and to successfully leave the member ring on shutdown.
      terminationGracePeriodSeconds: 300
      # -- The default is to deploy all pods in parallel.
      podManagementPolicy: "Parallel"
      persistence:
        # -- Enable volume claims in pod spec
        volumeClaimsEnabled: true
        # -- Enable StatefulSetAutoDeletePVC feature
        enableStatefulSetAutoDeletePVC: false
        # -- Storage class to be used.
        # If defined, storageClassName: <storageClass>.
        # If set to "-", storageClassName: "", which disables dynamic provisioning.
        # If empty or set to null, no storageClassName spec is
        # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
        storageClass: loki-sc
    
    # Configuration for the table-manager
    tableManager:
      # -- Specifies whether the table-manager should be enabled
      enabled: false
    
    # Configuration for the read pod(s)
    read:
      # -- Number of replicas for the read
      replicas: 2
      autoscaling:
        # -- Enable autoscaling for the read, this is only used if `queryIndex.enabled: true`
        enabled: false
      # -- Comma-separated list of Loki modules to load for the read
      targetModule: "read"
      # -- Whether or not to use the 2 target type simple scalable mode (read, write) or the
      # 3 target type (read, write, backend). Legacy refers to the 2 target type, so true will
      # run two targets, false will run 3 targets.
      legacyReadTarget: false
      # -- Resource requests and limits for the read
      resources:
        limits:
          cpu: 1.5
          memory: 2Gi
        requests:
          cpu: 500m
          memory: 500Mi
      # -- Grace period to allow the read to shutdown before it is killed
      terminationGracePeriodSeconds: 30
    
    # Configuration for the backend pod(s)
    backend:
      # -- Number of replicas for the backend
      replicas: 2
      autoscaling:
        # -- Enable autoscaling for the backend.
        enabled: false
      # -- Comma-separated list of Loki modules to load for the read
      targetModule: "backend"
      # -- Resource requests and limits for the backend
      resources:
        limits:
          cpu: 1
          memory: 1Gi
        requests:
          cpu: 500m
          memory: 500Mi
      # -- Grace period to allow the backend to shutdown before it is killed. Especially for the ingester,
      # this must be increased. It must be long enough so backends can be gracefully shutdown flushing/transferring
      # all data and to successfully leave the member ring on shutdown.
      terminationGracePeriodSeconds: 300
      podManagementPolicy: "Parallel"
      persistence:
        # -- Enable volume claims in pod spec
        volumeClaimsEnabled: true
        # -- Enable StatefulSetAutoDeletePVC feature
        enableStatefulSetAutoDeletePVC: true
        # -- Storage class to be used.
        # If defined, storageClassName: <storageClass>.
        # If set to "-", storageClassName: "", which disables dynamic provisioning.
        # If empty or set to null, no storageClassName spec is
        # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
        storageClass: loki-sc
    # Configuration for the single binary node(s)
    singleBinary:
      # -- Number of replicas for the single binary
      replicas: 0
    
    # Use either this ingress or the gateway, but not both at once.
    # If you enable this, make sure to disable the gateway.
    # You'll need to supply authn configuration for your ingress controller.
    ingress:
      enabled: true
      ingressClassName: "alb"
      annotations:
        (...skip...)
      paths:
        (...skip...)
      hosts:
        (...skip...)
    
    # Configuration for the memberlist service
    memberlist:
      service:
        publishNotReadyAddresses: false
    
    # Configuration for the gateway
    gateway:
      # -- Specifies whether the gateway should be enabled
      enabled: false
    
    networkPolicy:
      # -- Specifies whether Network Policies should be created
      enabled: false
    
    # -------------------------------------
    # Configuration for `minio` child chart
    # -------------------------------------
    minio:
      enabled: false
    
    sidecar:
      rules:
        # -- Whether or not to create a sidecar to ingest rule from specific ConfigMaps and/or Secrets.
        enabled: false
    
  2. push logs with Promtail to Loki(stream: {environment="dev" ...}, tenant id: kurlypay)

Expected behavior A clear and concise description of what you expected to happen.

  • stream matching {environment="dev"} for global retention does not been marked after 1d(reteion period) + 2h(retention_delete_delay)
    • saved log timestamp(UTC+0900): 2024-01-26 16:08:11.189+0900
    • expected log deletion timestamp: 2024-01-27 18:08:11.189+0900 (retention period + retention delete delay)

Environment:

  • Infrastructure: EKS(k8s 1.24), s3(using IRSA for object upload/delete... access)
  • Deployment tool: Helm

Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem.

  • Loki's k8s ConfigMap(config.yaml)
        analytics:
          reporting_enabled: false
        auth_enabled: true
        common:
          compactor_address: 'loki-backend'
          path_prefix: /var/loki
          replication_factor: 3
          ring:
            kvstore:
              store: memberlist
          storage:
            hedging:
              at: 250ms
              max_per_second: 20
              up_to: 3
            s3:
              bucketnames: kps-shr-tools-s3-loki
              insecure: false
              region: ap-northeast-2
              s3forcepathstyle: false
              storage_class: STANDARD
        compactor:
          compactor_ring:
            kvstore:
              store: memberlist
          retention_enabled: true
          shared_store: s3
          shared_store_key_prefix: compoactor/
          working_directory: /var/loki/compactor
        distributor:
          rate_store:
            debug: true
          ring:
            kvstore:
              store: memberlist
          write_failures_logging:
            add_insights_label: true
        frontend:
          compress_responses: true
          log_queries_longer_than: 5s
          query_stats_enabled: true
          scheduler_address: query-scheduler-discovery.loki-ns.svc.cluster.local.:9095
          scheduler_dns_lookup_period: 3s
        frontend_worker:
          match_max_concurrent: true
          scheduler_address: query-scheduler-discovery.loki-ns.svc.cluster.local.:9095
        index_gateway:
          mode: ring
          ring:
            kvstore:
              store: memberlist
        ingester:
          lifecycler:
            final_sleep: 15s
            ring:
              kvstore:
                store: memberlist
          wal:
            dir: /var/loki/ingester/wal
            flush_on_shutdown: true
            replay_memory_ceiling: 1GB
        limits_config:
          allow_structured_metadata: true
          max_cache_freshness_per_query: 10m
          max_line_size: 10KB
          per_stream_rate_limit: 5MB
          per_stream_rate_limit_burst: 20MB
          reject_old_samples: true
          reject_old_samples_max_age: 168h
          retention_period: 7d
          retention_stream:
          - period: 1d
            priority: 1
            selector: '{environment="dev"}'
          - period: 2d
            priority: 1
            selector: '{environment="stg"}'
          shard_streams:
            enabled: false
          split_queries_by_interval: 15m
        memberlist:
          join_members:
          - loki-memberlist
        querier:
          max_concurrent: 16
          multi_tenant_queries_enabled: true
          tail_max_duration: 30m
        query_range:
          align_queries_with_step: true
          cache_index_stats_results: false
          cache_results: true
          results_cache:
            cache:
              embedded_cache:
                enabled: true
                max_size_mb: 150
                ttl: 30m
              enable_fifocache: false
            compression: snappy
        query_scheduler:
          max_outstanding_requests_per_tenant: 32768
          querier_forget_delay: 60s
        ruler:
          storage:
            s3:
              bucketnames: kps-shr-tools-s3-loki
              insecure: false
              region: ap-northeast-2
              s3forcepathstyle: false
            type: s3
        runtime_config:
          file: /etc/loki/runtime-config/runtime-config.yaml
        schema_config:
          configs:
          - from: "2024-01-01"
            index:
              period: 24h
              prefix: tsdb_index_
            object_store: s3
            schema: v13
            store: tsdb
        server:
          grpc_listen_port: 9095
          http_listen_port: 3100
          log_format: logfmt
          log_level: info
          log_request_at_info_level_enabled: true
          log_request_headers: true
          log_source_ips_enabled: true
        storage_config:
          aws:
            bucketnames: kps-shr-tools-s3-loki
            insecure: false
            region: ap-northeast-2
            storage_class: STANDARD
          hedging:
            at: 250ms
            max_per_second: 20
            up_to: 3
          tsdb_shipper:
            active_index_directory: /var/loki/ingester/tsdb_shipper
            cache_location: /var/loki/index_gateway/tsdb_shipper
            index_gateway_client:
              log_gateway_requests: true
            shared_store: s3
            shared_store_key_prefix: tsdb_shipper/
        tracing:
          enabled: true
    
  • Grafana explore(queried at 2023-01-28 00:21+0900) 스크린샷 2024-01-28 오전 12 17 24
  • S3 스크린샷 2024-01-28 오전 12 33 39
  • backend target logs output
    level=info ts=2024-01-27T07:01:14.865219404Z caller=compactor.go:517 msg="applying retention with compaction"
    level=info ts=2024-01-27T07:01:14.865256731Z caller=expiration.go:78 msg="overall smallest retention period 1706252474.865, default smallest retention period 1706252474.865"
    ts=2024-01-27T07:01:14.86529407Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:01:14.946262639Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=80.958298ms
    level=info ts=2024-01-27T07:02:14.808135172Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:03:14.807959275Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:04:14.808085233Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:05:14.808426886Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:06:08.655238728Z caller=table_manager.go:228 index-store=tsdb-2024-01-01 msg="syncing tables"
    ts=2024-01-27T07:06:08.655315103Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:06:08.705822649Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=50.499015ms
    ts=2024-01-27T07:06:08.705869075Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:06:08.72114278Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.268517ms
    ts=2024-01-27T07:06:08.721188867Z caller=spanlogger.go:86 level=info msg="building table cache"
    ts=2024-01-27T07:06:08.741034208Z caller=spanlogger.go:86 level=info msg="table cache built" duration=19.839877ms
    ts=2024-01-27T07:06:08.74110291Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:06:08.75919025Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=18.080712ms
    ts=2024-01-27T07:06:08.759236094Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:06:08.773159942Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=13.919151ms
    ts=2024-01-27T07:06:08.773199339Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:06:08.818762004Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=45.556466ms
    ts=2024-01-27T07:06:08.818802868Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:06:08.8361649Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=17.356727ms
    ts=2024-01-27T07:06:08.836208945Z caller=spanlogger.go:86 level=info msg="building table cache"
    ts=2024-01-27T07:06:08.850882986Z caller=spanlogger.go:86 level=info msg="table cache built" duration=14.668228ms
    ts=2024-01-27T07:06:08.8509299Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:06:08.866052312Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.118183ms
    ts=2024-01-27T07:06:08.866082521Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:06:08.882078327Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.990817ms
    ts=2024-01-27T07:06:08.882121167Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:06:08.900934931Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=18.809387ms
    ts=2024-01-27T07:06:08.900965805Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:06:08.917191332Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=16.221127ms
    ts=2024-01-27T07:06:08.917222594Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:06:08.931074256Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=13.843814ms
    ts=2024-01-27T07:06:08.931099495Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:06:08.944555033Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=13.451656ms
    level=info ts=2024-01-27T07:06:08.944579799Z caller=table_manager.go:271 index-store=tsdb-2024-01-01 msg="query readiness setup completed" duration=2.952µs distinct_users_len=0 distinct_users=
    level=info ts=2024-01-27T07:06:14.808627191Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:07:14.808759172Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:08:14.807985004Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:09:14.80859647Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:10:14.808345602Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:11:08.655754186Z caller=table_manager.go:228 index-store=tsdb-2024-01-01 msg="syncing tables"
    ts=2024-01-27T07:11:08.655826266Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:08.708818756Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=52.983678ms
    ts=2024-01-27T07:11:08.708862067Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:08.725769394Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=16.902595ms
    ts=2024-01-27T07:11:08.725807396Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:08.74124526Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.433144ms
    ts=2024-01-27T07:11:08.741285717Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:08.756958028Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.667322ms
    ts=2024-01-27T07:11:08.756997287Z caller=spanlogger.go:86 level=info msg="building table cache"
    ts=2024-01-27T07:11:08.773056633Z caller=spanlogger.go:86 level=info msg="table cache built" duration=16.054893ms
    ts=2024-01-27T07:11:08.77310801Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:08.788781492Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.669059ms
    ts=2024-01-27T07:11:08.788805885Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:08.804863018Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=16.05236ms
    ts=2024-01-27T07:11:08.804891816Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:08.820865821Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.969768ms
    ts=2024-01-27T07:11:08.82089105Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:08.836186464Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.29177ms
    ts=2024-01-27T07:11:08.836213075Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:08.854480054Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=18.263026ms
    ts=2024-01-27T07:11:08.854503319Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:08.873903565Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=19.395869ms
    ts=2024-01-27T07:11:08.873936021Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:08.890227641Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=16.287615ms
    ts=2024-01-27T07:11:08.890249099Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:08.905204725Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=14.951348ms
    ts=2024-01-27T07:11:08.905235933Z caller=spanlogger.go:86 level=info msg="building table cache"
    ts=2024-01-27T07:11:08.920680952Z caller=spanlogger.go:86 level=info msg="table cache built" duration=15.43968ms
    level=info ts=2024-01-27T07:11:08.920720759Z caller=table_manager.go:271 index-store=tsdb-2024-01-01 msg="query readiness setup     completed" duration=2.59µs distinct_users_len=0 distinct_users=
    level=info ts=2024-01-27T07:11:14.808141385Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:11:14.86529441Z caller=compactor.go:517 msg="applying retention with compaction"
    level=info ts=2024-01-27T07:11:14.865334949Z caller=expiration.go:78 msg="overall smallest retention period 1706253074.865,     default smallest retention period 1706253074.865"
    ts=2024-01-27T07:11:14.865372091Z caller=spanlogger.go:86 level=info msg="building table names cache"
    ts=2024-01-27T07:11:14.914375642Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=48.995438ms
    level=info ts=2024-01-27T07:12:14.807707634Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:13:14.808146732Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:14:14.808286317Z caller=marker.go:202 msg="no marks file found"
    level=info ts=2024-01-27T07:15:14.808442889Z caller=marker.go:202 msg="no marks file found"
    

pyo-counting avatar Jan 27 '24 15:01 pyo-counting

I tested below values.yaml file and checked compaction and retention is working.

loki:
  auth_enabled: false
  limits_config:
    retention_period: 1d
  commonConfig:
    replication_factor: 2
  storage:
    bucketNames:
      chunks: kps-shr-tools-s3-loki-test
      ruler: kps-shr-tools-s3-loki-test
    s3:
      region: ap-northeast-2
  storage_config:
    boltdb_shipper:
        active_index_directory: /var/loki/data/index
        cache_location: /var/loki/data/boltdb-cache
        shared_store: s3
  compactor:
    working_directory: /var/loki/data/retention
    shared_store: s3
    retention_delete_delay: 30m
    compaction_interval: 10m
    retention_enabled: true
    retention_delete_worker_count: 150
serviceAccount:
  name: loki-sa
  imagePullSecrets: []
  annotations:
    eks.amazonaws.com/role-arn: (...skip...)
  rules:
    enabled: false
    alerting: false
  serviceMonitor:
    enabled: false
  lokiCanary:
    enabled: false
write:
  replicas: 2
  persistence:
    storageClass: loki-sc
read:
  replicas: 2
  persistence:
    storageClass: loki-sc
backend:
  replicas: 2
  persistence:
    storageClass: loki-sc
gateway:
  enabled: false
extraObjects:
  - apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: loki-sc
    provisioner: efs.csi.aws.com
    parameters:
      provisioningMode: efs-ap
      fileSystemId: (...skip...)
      directoryPerms: "700"
      uid: '{{ .Values.loki.podSecurityContext.runAsUser }}'
      gid: '{{ .Values.loki.podSecurityContext.runAsGroup }}'

What did I miss? Let me know, please

pyo-counting avatar Feb 05 '24 14:02 pyo-counting

Finally, I found the cause. It's a problem caused by the different value between-tsdb.shipper.shared-store.key-prefix and -compactor.shared-store.key-prefix.

I simply thought the compactor was using -compactor.shared-store.key-prefix flag for deletion purposes not for compact and retention. But it wasn't.

I hope this will be added to the official Loki documentation. There are each option for compactor and writer, so there might be people who have the same misconception as me.

pyo-counting avatar Feb 06 '24 01:02 pyo-counting

Hi, Can you elaborate on this? Did you set each value separately?

icanhazbeer avatar Feb 22 '24 21:02 icanhazbeer

@icanhazbeer That's right. I set two runtime flag values to different values.

  • -tsdb.shipper.shared-store.key-prefix
  • -compactor.shared-store.key-prefix

And the results of the test are as follows.

  • The log entry deletion request is saved at -compactor.shared-store.key-prefix by the compactor.
  • The position compactor refers to for compaction and retention is -compactor.shared-store.key-prefix(Before the test, I thought comfactor was referring to -tsdb.shipper.shared-store.key-prefix)

As a result, we can see that the two flags must always have the same value in order for a comacptor to perform a compaction, retention properly.

pyo-counting avatar Feb 23 '24 00:02 pyo-counting