loki
loki copied to clipboard
Old chunks not getting deleted after retention period
Describe the bug I've configured 168h retention for my logs, but I can see chunks 5 years old filling my disk
To Reproduce
this is my config
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
grpc_server_max_recv_msg_size: 8388608
grpc_server_max_send_msg_size: 8388608
querier:
engine:
max_look_back_period: 168h
ingester:
wal:
enabled: true
dir: /tmp/wal
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 24h # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 24h # All chunks will be flushed when they hit this age, default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 5m # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /tmp/loki/boltdb-shipper-active
cache_location: /tmp/loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: filesystem
filesystem:
directory: /tmp/loki/chunks
compactor:
working_directory: /tmp/loki/boltdb-shipper-compactor
shared_store: filesystem
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
max_streams_per_user: 1000000
max_entries_limit_per_query: 5000000
ingestion_rate_mb: 100
ingestion_burst_size_mb: 20
chunk_store_config:
max_look_back_period: 168h
table_manager:
retention_deletes_enabled: false
retention_period: 168h
Expected behavior Chunks older than 168h should be deleted.
Environment:
- Infrastructure: [e.g., Kubernetes, bare-metal, laptop]
- Deployment tool: [e.g., helm, jsonnet]
Screenshots, Promtail config, or terminal output
We can see 49 days of logs although I've configured 168h

I got the same issue. Logs older than 7 days are deleted and they're not visible in Grafana. Only the chunk files won't be deleted on the filesystem.
loki, version 2.5.0 (branch: HEAD, revision: 2d9d0ee23)
build user: root@4779f4b48f3a
build date: 2022-04-07T21:50:00Z
go version: go1.17.6
platform: linux/amd64
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /var/lib/loki
storage:
filesystem:
chunks_directory: /var/lib/loki/chunks
rules_directory: /var/lib/loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
chunk_store_config:
max_look_back_period: 168h
table_manager:
retention_deletes_enabled: true
retention_period: 168h
Use compactor not table_manager if you arent using AWS S3.
Use compactor not table_manager if you arent using AWS S3.
Thanks, that did the trick :)
Hey @DeBuXer, could you post what you add to your config file in order to get deletion on s3 done?. I am running the same issue and I have not found the solution.
@Mastedont, I don't use S3, I store my chunks directly on disk. My current configuration:
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /var/lib/loki
storage:
filesystem:
chunks_directory: /var/lib/loki/chunks
rules_directory: /var/lib/loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
chunk_store_config:
max_look_back_period: 168h
compactor:
working_directory: /var/lib/loki/retention
shared_store: filesystem
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
limits_config:
retention_period: 168h
ruler:
alertmanager_url: http://127.0.0.1:9093
Thank you, @DeBuXer
One last question @DeBuXer.
Do you how can I know if log retention is working or not? What output written in logs will let know thats is working?
@Mastedont, Not 100% sure, but I guess:
Jun 14 15:15:44 loki loki[277929]: level=info ts=2022-06-14T13:15:44.57537489Z caller=index_set.go:280 table-name=index_19150 msg="removing source db files from storage" count=1
Jun 14 15:15:44 loki loki[277929]: level=info ts=2022-06-14T13:15:44.576099223Z caller=compactor.go:495 msg="finished compacting table" table-name=index_19150
That log output is from Ingester?
I only can see outpur like this, despite of having Compactor enabled:
level=info ts=2022-06-14T14:05:17.574831148Z caller=table.go:358 msg="uploading table loki_pre_19157"
level=info ts=2022-06-14T14:05:17.574847901Z caller=table.go:385 msg="finished uploading table loki_pre_19157"
level=info ts=2022-06-14T14:05:17.57485537Z caller=table.go:443 msg="cleaning up unwanted dbs from table loki_pre_19157"
That log output is from Ingester?
From /var/log/syslog but should have the same information. When compactor is enabled, you should see something like;
level=info ts=2022-06-14T14:24:56.072803949Z caller=compactor.go:324 msg="this instance has been chosen to run the compactor, starting compactor"
@DeBuXer , thanks a lot for your support here. I don't see the chunk files getting rotated. I also see pretty old index directories as well. I wanted my logs to be rotated every 7 days. I am not sure what I am doing wrong here. Could you please help me with it?
auth_enabled: false
chunk_store_config:
max_look_back_period: 168h
compactor:
shared_store: filesystem
working_directory: /data/loki/boltdb-shipper-compactor
ingester:
chunk_block_size: 262144
chunk_idle_period: 3m
chunk_retain_period: 1m
wal:
dir: /data/loki/wal
flush_on_shutdown: true
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
max_transfer_retries: 0
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 32
ingestion_burst_size_mb: 36
unordered_writes: true
retention_period: 168h
schema_config:
configs:
- from: 2020-10-24
index:
period: 24h
prefix: index_
object_store: filesystem
schema: v11
store: boltdb-shipper
server:
http_listen_port: 3100
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /data/loki/chunks
table_manager:
retention_deletes_enabled: true
retention_period: 168h
@rickydjohn, I think you need to enable retention_enabled. See also https://grafana.com/docs/loki/latest/operations/storage/retention/#retention-configuration
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale label sorted by thumbs up.
We may also:
- Mark issues as
revivableif we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed). - Add a
keepalivelabel to silence the stalebot if the issue is very common/popular/important.
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
Hi @Mastedont , did you manage to get the chunks deleted from s3? Im having that same problem, that i cannot see any logs about compactor and in s3 store there are older files (>7days) than my configured retention. It seems only the index is cleared, because grafana wont show older log entries.
Hi @Mastedont , did you manage to get the chunks deleted from s3? Im having that same problem, that i cannot see any logs about compactor and in s3 store there are older files (>7days) than my configured retention. It seems only the index is cleared, because grafana wont show older log entries.
I have the same problem
Hi, I have this relevant config:
compactor:
compaction_interval: 10m
retention_delete_delay: 2h
retention_delete_worker_count: 150
retention_enabled: true
shared_store: s3
working_directory: /var/loki/retention
limits_config:
enforce_metric_name: false
max_cache_freshness_per_query: 10m
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: 720h
split_queries_by_interval: 30m
but log files are not deleted on S3, only index compacted.
I'm also finding this on loki 2.4.0, using minio as storage. Even with retention_delete_delay: 5m no chunks are being deleted.
@Mastedont, I don't use S3, I store my chunks directly on disk. My current configuration:
server: http_listen_port: 3100 grpc_listen_port: 9096 common: path_prefix: /var/lib/loki storage: filesystem: chunks_directory: /var/lib/loki/chunks rules_directory: /var/lib/loki/rules replication_factor: 1 ring: instance_addr: 127.0.0.1 kvstore: store: inmemory schema_config: configs: - from: 2020-10-24 store: boltdb-shipper object_store: filesystem schema: v11 index: prefix: index_ period: 24h chunk_store_config: max_look_back_period: 168h compactor: working_directory: /var/lib/loki/retention shared_store: filesystem compaction_interval: 10m retention_enabled: true retention_delete_delay: 2h retention_delete_worker_count: 150 limits_config: retention_period: 168h ruler: alertmanager_url: http://127.0.0.1:9093
Hi,will this configuration clean up expired files in the chunks directory?
any update ?
Judging from the discussion in this issue https://github.com/grafana/loki/issues/7068 I don't think the compactor will delete the chunks in s3 object store you need a bucket lifecycle policy for that.
It would be nice to have a clear answer on this though.
For everyone wondering what's going with the retention, I've tested the feature a lot in the past days so here are what will work.
Minimal Configuration Needed
First of all, you absolutely need those config setup
limits_config:
retention_period: 10d # Keep 10 days
compactor:
delete_request_cancel_period: 10m # don't wait 24h before processing the delete_request
retention_enabled: true # actually do the delete
retention_delete_delay: 2h # wait 2 hours before actually deleting stuff
You can tweak those config to delete faster or slower.
Check If It's Working
Once you got those config up and running, check that the logs are actually reporting that the retention is being applied : msg="applying retention with compaction". The "caller" for this log is compactor.go.
Next, check that the retention manager is actually doing it's job in the logs: msg="mark file created" and msg="no marks file found" from the caller marker.go.
The mark file created means that loki did found some chunks to be deleted and it has created a file to keep track of it. The no marks file found means that while performing the chunk delete routine, there was no file that matched it's filters, the filters mainly being the delay.
Whenever you see the mark file created logs, you can go into the working directory of the compactor and check for the mark files. The path should be something like /var/loki/compactor/retention/markers. These files are kept there for 2 hours or whatever is set in retention_delete_delay. After retention_delete_delay is passed, loki will delete the chunks.
Not having any of the logs mentionned above means that the retention process is not started.
Important Notes
Loki will only delete chunks that are indexed. The indexes are actually being purged before deleting the chunks. This means that if you lose files from the compactor's working directory, whatever chunks that were marked there won't be deleted ever so it is still worth to have a lifecycle policy to cover for this OR have persistent storage for this particular folder.
@nvanheuverzwijn if I were the CTO of Grafana Labs, I would give you a job offer immediately
@nvanheuverzwijn Thank you a lot! Your explanation makes me clear. The Loki document makes me confuse that Table Manager also deletes chucks when using the filesystem chuck store.
more info about 'Check If It's Working'
such as compaction_interval: 10m
assuming the loki instance start at 2023-07-11T12:30:25.060395045Z, then there are logs about caller=compactor.go at ts=2023-07-11T12:40:25.047110295Z
level=info ts=2023-07-11T12:30:25.060441045Z caller=compactor.go:440 msg="waiting 10m0s for ring to stay stable and previous compactions to finish before starting compactor" level=info ts=2023-07-11T12:40:25.045542628Z caller=compactor.go:445 msg="compactor startup delay completed" level=info ts=2023-07-11T12:40:25.045568295Z caller=compactor.go:497 msg="compactor started" level=info ts=2023-07-11T12:40:25.04562367Z caller=compactor.go:454 msg="applying retention with compaction" level=info ts=2023-07-11T12:40:25.047110295Z caller=compactor.go:609 msg="compacting table" table-name=loki_index_19549 level=info ts=2023-07-11T12:40:25.047208753Z caller=table_compactor.go:325 table-name=loki_index_19549 msg="using compactor-1689078092.gz as seed file" level=info ts=2023-07-11T12:40:25.048495753Z caller=util.go:85 table-name=loki_index_19549 file-name=compactor-1689078092.gz msg="downloaded file" total_time=1.280041ms level=info ts=2023-07-11T12:40:25.06665592Z caller=compactor.go:614 msg="finished compacting table" table-name=loki_index_19549 level=info ts=2023-07-11T12:40:25.066668503Z caller=compactor.go:609 msg="compacting table" table-name=loki_index_19548 level=info ts=2023-07-11T12:40:25.067591628Z caller=util.go:85 table-name=loki_index_19548 file-name=compactor-1689041382.gz msg="downloaded file" total_time=863.125µs level=info ts=2023-07-11T12:40:25.078401878Z caller=compactor.go:614 msg="finished compacting table" table-name=loki_index_19548
@yangmeilly
you can send full config pls loki.yaml?
for me not working.
@yangmeilly
you can send full config pls
loki.yaml?for me not working.
in my scenario, using boltdb-shipper for indexs and filesystem for chunks. the full loki config as following, and bold should have your attention.
compactor block: compaction_interval: 10m delete_request_cancel_period: 2h retention_delete_delay: 2h retention_delete_worker_count: 150 retention_enabled: true shared_store: filesystem working_directory: /var/loki/retention
limits_config block: enforce_metric_name: false max_cache_freshness_per_query: 10m reject_old_samples: true reject_old_samples_max_age: 168h split_queries_by_interval: 15m etention_period: 72h max_query_lookback: 72h
table_manager block: // this make nosense for filesystem retention_deletes_enabled: false retention_period: 0
Whenever you see the
mark file createdlogs, you can go into the working directory of the compactor and check for the mark files. The path should be something like/var/loki/compactor/retention/markers. These files are kept there for 2 hours or whatever is set inretention_delete_delay. Afterretention_delete_delayis passed, loki will delete the chunks.Not having any of the logs mentionned above means that the retention process is not started.
@nvanheuverzwijn Thanks for the info. Regarding your statement, loki will delete the chunks, are you talking about a filesystem backend or also a s3/azure backend? I can't find a definitive answer stating that loki is able to delete chunks from external storage.
It will also delete on S3/Azure. I did this with Google Cloud storage but it should be the same for the other backend.
Le mar. 25 juill. 2023, 08 h 55, HammerNL89 @.***> a écrit :
Whenever you see the mark file created logs, you can go into the working directory of the compactor and check for the mark files. The path should be something like /var/loki/compactor/retention/markers. These files are kept there for 2 hours or whatever is set in retention_delete_delay. After retention_delete_delay is passed, loki will delete the chunks.
Not having any of the logs mentionned above means that the retention process is not started.
@nvanheuverzwijn https://github.com/nvanheuverzwijn Thanks for the info. Regarding your statement, loki will delete the chunks, are you talking about a filesystem backend or also a s3/azure backend? I can't find a definitive answer stating that loki is able to delete chunks from external storage.
— Reply to this email directly, view it on GitHub https://github.com/grafana/loki/issues/6300#issuecomment-1649788775, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHGI6RV26I2EEQSLUL2GJ3XR664JANCNFSM5XXPS5YA . You are receiving this because you were mentioned.Message ID: @.***>
@nvanheuverzwijn
compactor did not delete the chunk. why?
compactor log:
level=info ts=2023-08-03T06:50:12.634846248Z caller=compactor.go:497 msg="compactor started"
level=info ts=2023-08-03T06:50:12.634865722Z caller=compactor.go:454 msg="applying retention with compaction"
level=info ts=2023-08-03T06:50:12.634865349Z caller=marker.go:177 msg="mark processor started" workers=150 delay=2h0m0s
level=info ts=2023-08-03T06:50:12.634955656Z caller=expiration.go:78 msg="overall smallest retention period 1690440612.634, default smallest retention period 1690440612.634"
ts=2023-08-03T06:50:12.635021334Z caller=spanlogger.go:85 level=info msg="building index list cache"
level=info ts=2023-08-03T06:50:12.635046761Z caller=marker.go:202 msg="no marks file found"
config:
storage_config:
aws:
access_key_id: xxxxxx
bucketnames: loki
endpoint: https://s3.xxxx.com
s3forcepathstyle: true
secret_access_key: xxxxx
boltdb_shipper:
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
cache_ttl: 24h
index_gateway_client:
server_address: dns:///loki-distributed-index-gateway:9095
shared_store: s3
compactor:
retention_enabled: true
shared_store: s3
working_directory: /var/loki/compactor
retention_delete_delay: 2h
delete_request_cancel_period: 10m
limits_config:
enforce_metric_name: false
ingestion_burst_size_mb: 1024
ingestion_rate_mb: 1024
max_cache_freshness_per_query: 10m
max_global_streams_per_user: 0
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: 1h
split_queries_by_interval: 15m
update: It was caused by my incorrect configuration.
The storage configuration needs to be placed in the common.
common:
compactor_address: http://loki-distributed-compactor:3100
storage:
s3:
access_key_id: xxxxxx
bucketnames: loki
endpoint: https://s3.xxxx.com
s3forcepathstyle: true
secret_access_key: xxxxxx
@nvanheuverzwijn so beautiful