loki
loki copied to clipboard
Growing memory usage in loki-read
Describe the bug Memory usage grows permanently with regular usage. Similar to #4119
To Reproduce Steps to reproduce the behavior:
- Run simple deployment Loki 2.4.2
- Push 3k/s logs to Loki
- Add Loki datasource to Grafana (v8.1.1)
- Use the explore option in Grafana, with Loki's datasource and query: {pod_name=~"vector-load-tests.*"} (filter will show 3 pod logs)
- Add query options: last 1h , 1000 line limit, refresh 5s. The memory consumption will stabilise after a while. After some time stop querying and start querying again with the options: last 1h , 100 line limit, refresh 5s.
Expected behavior Memory usage should be in-line with the read throughput or (relatively) stable if the throughput doesn't increase from previous consumption.
Environment:
- Infrastructure: Kubernetes
- Loki config:
auth_enabled: true
server:
http_listen_port: 3100
grpc_listen_port: 9095
grpc_server_max_recv_msg_size: 52428800
grpc_server_max_send_msg_size: 52428800
distributor:
ring:
kvstore:
store: memberlist
memberlist:
bind_port: 7946
join_members:
- loki-memberlist
ingester:
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
chunk_idle_period: 1m
chunk_block_size: 10485760
chunk_encoding: snappy
chunk_retain_period: 0s
max_chunk_age: 2h
max_transfer_retries: 0
wal:
enabled: true
dir: /var/loki/wal
replay_memory_ceiling: 1GB
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
max_cache_freshness_per_query: 10m
ingestion_rate_mb: 10
schema_config:
configs:
- from: 2021-05-01
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: loki_index_
period: 24h
storage_config:
boltdb_shipper:
shared_store: s3
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
cache_ttl: 10m
aws:
bucketnames: <bucket_name>
endpoint: s3.eu-west-1.amazonaws.com
region: eu-west-1
access_key_id: <access_key_id>
secret_access_key: <secret_access_key>
insecure: false
sse_encryption: false
http_config:
idle_conn_timeout: 90s
response_header_timeout: 0s
insecure_skip_verify: false
s3forcepathstyle: true
querier:
engine:
max_look_back_period: 168h
table_manager:
retention_deletes_enabled: true
retention_period: 168h
query_range:
align_queries_with_step: true
max_retries: 5
split_queries_by_interval: 15m
cache_results: false
results_cache:
cache:
enable_fifocache: false
fifocache:
max_size_items: 1
validity: 1m
compactor:
working_directory: /var/loki/compactor
shared_store: filesystem
ruler:
storage:
type: local
local:
directory: /var/loki/ruler
- GOGC=5
- Log volume: 3k/s (k8s logs <- Vector -> Kafka <- Vector -> Loki)
Screenshots, Promtail config, or terminal output Metrics for the Loki process after running some tests:
- k8s memory usage
- go memory usage
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale
label sorted by thumbs up.
We may also:
- Mark issues as
revivable
if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed). - Add a
keepalive
label to silence the stalebot if the issue is very common/popular/important.
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
keepalive
I have the same issue !! , i'm using simple scalable config (docker container), with nginx gateway. And syslog-ng is sending logs to promtail .
The bug : loki crushed for large queries, for example (last 6 hours in time range) !!
Loki-config.yml :
---
server:
http_listen_port: 3100
http_server_read_timeout: 10m
http_server_write_timeout: 10m
http_server_idle_timeout: 10m
log_level: debug
# grpc_server_max_recv_msg_size: 52428800
# grpc_server_max_send_msg_size: 52428800
distributor:
ring:
kvstore:
store: memberlist
memberlist:
join_members:
- loki:7946
limits_config:
per_stream_rate_limit: 0
max_query_parallelism: 50 # Maximum number of queries that will be scheduled in parallel by the frontend.
query_range:
split_queries_by_interval: 10m
parallelise_shardable_queries: true
cache_results: true
results_cache:
cache:
enable_fifocache: true
fifocache:
max_size_items: 1024
validity: 24h
querier:
max_concurrent: 25
ingester:
lifecycler:
# address: 127.0.0.1
ring:
kvstore:
store: memberlist
replication_factor: 1
final_sleep: 0s
chunk_target_size: 1536000
chunk_idle_period: 30m
chunk_retain_period: 1m # How long chunks should be retained in-memory after they've been flushed.
chunk_block_size: 262144
chunk_encoding: snappy
max_chunk_age: 1h
#max_transfer_retries: 3 # Number of times to try and transfer chunks when leaving before
schema_config:
configs:
- from: 2021-08-01
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: index_
period: 24h
common:
path_prefix: /loki
# replication_factor: 1
storage:
s3:
endpoint: minio:9000
insecure: true
bucketnames: loki-data
access_key_id: loki
secret_access_key: loki
s3forcepathstyle: true
# ring:
# kvstore:
# store: memberlist
ruler:
storage:
s3:
bucketnames: loki-ruler
#frontend:
# log_queries_longer_than: 5s
# # downstream_url: http://loki-1:3100
# downstream_url: http://gateway:3100
# compress_responses: true
@community, could you help on this issue pls ?
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale
label sorted by thumbs up.
We may also:
- Mark issues as
revivable
if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed). - Add a
keepalive
label to silence the stalebot if the issue is very common/popular/important.
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
keepalive
Hi. I am facing the same issue. with Loki version 2.6.1 any updates?
Had this problem, root cause was that compactor was repeatedly failing because it didn't have permission to delete objects in the s3 bucket storage, adding DeleteObject permissions fixed it.
I am facing the same issue, with Loki version 2.6.1
go_routines
k8s memory
@ranryl did you try to upgrade loki version ?
@ranryl did you try to upgrade loki version ?
Now, upgrade to 2.7.2,not yet resovled。changed the querier pull mode,I continue to observe it
goroutine profile: total 479556
348621 @ 0x43ccd6 0x44dc7e 0x44dc55 0x46a285 0x1ba3293 0x1ba3269 0x1ba63cb 0x1ba1c0f 0x199234f 0x1ba1846 0x1ba9150 0xba72da 0x1ba90a5 0x133fea9 0x1340f5f 0x133f4ad 0x1379899 0x1378ac5 0x1378805 0x13786c9 0x46e481
# 0x46a284 sync.runtime_SemacquireMutex+0x24 /usr/local/go/src/runtime/sema.go:77
# 0x1ba3292 sync.(*RWMutex).RLock+0x72 /usr/local/go/src/sync/rwmutex.go:71
# 0x1ba3268 github.com/grafana/loki/pkg/storage/stores/shipper/index.(*Table).ForEach+0x48 /src/loki/pkg/storage/stores/shipper/index/table.go:176
# 0x1ba63ca github.com/grafana/loki/pkg/storage/stores/shipper/index.(*TableManager).ForEach+0x4a /src/loki/pkg/storage/stores/shipper/index/table_manager.go:109
# 0x1ba1c0e github.com/grafana/loki/pkg/storage/stores/shipper/index.(*querier).QueryPages.func1+0x1ae /src/loki/pkg/storage/stores/shipper/index/querier.go:49
# 0x199234e github.com/grafana/loki/pkg/storage/stores/shipper/util.DoParallelQueries+0x16e /src/loki/pkg/storage/stores/shipper/util/queries.go:39
# 0x1ba1845 github.com/grafana/loki/pkg/storage/stores/shipper/index.(*querier).QueryPages+0x305 /src/loki/pkg/storage/stores/shipper/index/querier.go:46
# 0x1ba914f github.com/grafana/loki/pkg/storage/stores/shipper.(*indexClient).QueryPages.func1+0x4f /src/loki/pkg/storage/stores/shipper/shipper_index_client.go:165
# 0xba72d9 github.com/weaveworks/common/instrument.CollectedRequest+0x279 /src/loki/vendor/github.com/weaveworks/common/instrument/instrument.go:167
# 0x1ba90a4 github.com/grafana/loki/pkg/storage/stores/shipper.(*indexClient).QueryPages+0x124 /src/loki/pkg/storage/stores/shipper/shipper_index_client.go:164
# 0x133fea8 github.com/grafana/loki/pkg/storage/stores/series/index.(*cachingIndexClient).queryPages+0x968 /src/loki/pkg/storage/stores/series/index/caching_index_client.go:176
# 0x1340f5e github.com/grafana/loki/pkg/storage/stores/series/index.(*cachingIndexClient).doBroadQueries+0x7e /src/loki/pkg/storage/stores/series/index/caching_index_client.go:234
# 0x133f4ac github.com/grafana/loki/pkg/storage/stores/series/index.(*cachingIndexClient).QueryPages+0x8c /src/loki/pkg/storage/stores/series/index/caching_index_client.go:103
# 0x1379898 github.com/grafana/loki/pkg/storage/stores/series.(*indexReaderWriter).lookupEntriesByQueries+0x178 /src/loki/pkg/storage/stores/series/series_index_store.go:568
# 0x1378ac4 github.com/grafana/loki/pkg/storage/stores/series.(*indexReaderWriter).lookupIdsByMetricNameMatcher+0x1c4 /src/loki/pkg/storage/stores/series/series_index_store.go:490
# 0x1378804 github.com/grafana/loki/pkg/storage/stores/series.(*indexReaderWriter).lookupSeriesByMetricNameMatcher+0x84 /src/loki/pkg/storage/stores/series/series_index_store.go:464
# 0x13786c8 github.com/grafana/loki/pkg/storage/stores/series.(*indexReaderWriter).lookupSeriesByMetricNameMatchers.func1+0x88 /src/loki/pkg/storage/stores/series/series_index_store.go:409
130485 @ 0x43ccd6 0x44cbbc 0x1377e29 0x13737e5 0x11e15cd 0xba72da 0x11e149a 0x136b6f2 0x136a4c8 0x136af86 0x136a37e 0x1c07fb0 0xf1ec78 0x1c72924 0xb1469a 0xbcc905 0x1d46ee7 0xb1469a 0xbcce22 0xb1469a 0xb17b42 0xb1469a 0xbce01b 0xb1469a 0xb1453e 0xf1eb38 0xacc54f 0xad0b8f 0xaca0b8 0x46e481
# 0x1377e28 github.com/grafana/loki/pkg/storage/stores/series.(*indexReaderWriter).lookupSeriesByMetricNameMatchers+0x728 /src/loki/pkg/storage/stores/series/series_index_store.go:426
# 0x13737e4 github.com/grafana/loki/pkg/storage/stores/series.(*indexReaderWriter).GetChunkRefs+0x4e4 /src/loki/pkg/storage/stores/series/series_index_store.go:160
# 0x11e15cc github.com/grafana/loki/pkg/storage/stores/index.monitoredReaderWriter.GetChunkRefs.func1+0x6c /src/loki/pkg/storage/stores/index/index.go:54
# 0xba72d9 github.com/weaveworks/common/instrument.CollectedRequest+0x279 /src/loki/vendor/github.com/weaveworks/common/instrument/instrument.go:167
# 0x11e1499 github.com/grafana/loki/pkg/storage/stores/index.monitoredReaderWriter.GetChunkRefs+0x1d9 /src/loki/pkg/storage/stores/index/index.go:52
# 0x136b6f1 github.com/grafana/loki/pkg/storage/stores.(*storeEntry).GetChunkRefs+0x631 /src/loki/pkg/storage/stores/composite_store_entry.go:67
# 0x136a4c7 github.com/grafana/loki/pkg/storage/stores.compositeStore.GetChunkRefs.func1+0xa7 /src/loki/pkg/storage/stores/composite_store.go:149
# 0x136af85 github.com/grafana/loki/pkg/storage/stores.compositeStore.forStores+0x265 /src/loki/pkg/storage/stores/composite_store.go:241
# 0x136a37d github.com/grafana/loki/pkg/storage/stores.compositeStore.GetChunkRefs+0xfd /src/loki/pkg/storage/stores/composite_store.go:148
# 0x1c07faf github.com/grafana/loki/pkg/ingester.(*Ingester).GetChunkIDs+0x18f /src/loki/pkg/ingester/ingester.go:800
# 0xf1ec77 github.com/grafana/loki/pkg/logproto._Querier_GetChunkIDs_Handler.func1+0x77 /src/loki/pkg/logproto/logproto.pb.go:5095
# 0x1c72923 github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1+0xc3 /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:33
# 0xb14699 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39 /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xbcc904 github.com/weaveworks/common/middleware.ServerUserHeaderInterceptor+0x64 /src/loki/vendor/github.com/weaveworks/common/middleware/grpc_auth.go:38
# 0x1d46ee6 github.com/grafana/loki/pkg/util/fakeauth.SetupAuthMiddleware.func1+0xa6 /src/loki/pkg/util/fakeauth/fake_auth.go:27
# 0xb14699 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39 /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xbcce21 github.com/weaveworks/common/middleware.UnaryServerInstrumentInterceptor.func1+0xa1 /src/loki/vendor/github.com/weaveworks/common/middleware/grpc_instrumentation.go:35
# 0xb14699 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39 /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xb17b41 github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1+0x401 /src/loki/vendor/github.com/opentracing-contrib/go-grpc/server.go:57
# 0xb14699 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39 /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xbce01a github.com/weaveworks/common/middleware.GRPCServerLog.UnaryServerInterceptor+0xba /src/loki/vendor/github.com/weaveworks/common/middleware/grpc_logging.go:30
# 0xb14699 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39 /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xb1453d github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0xbd /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
# 0xf1eb37 github.com/grafana/loki/pkg/logproto._Querier_GetChunkIDs_Handler+0x137 /src/loki/pkg/logproto/logproto.pb.go:5097
# 0xacc54e google.golang.org/grpc.(*Server).processUnaryRPC+0xcce /src/loki/vendor/google.golang.org/grpc/server.go:1282
# 0xad0b8e google.golang.org/grpc.(*Server).handleStream+0xa2e /src/loki/vendor/google.golang.org/grpc/server.go:1619
# 0xaca0b7 google.golang.org/grpc.(*Server).serveStreams.func1.2+0x97 /src/loki/vendor/google.golang.org/grpc/server.go:921
Same issue at 2.7.5 My setup is single node in k8s