loki icon indicating copy to clipboard operation
loki copied to clipboard

Growing memory usage in loki-read

Open twix14 opened this issue 3 years ago • 6 comments

Describe the bug Memory usage grows permanently with regular usage. Similar to #4119

To Reproduce Steps to reproduce the behavior:

  1. Run simple deployment Loki 2.4.2
  2. Push 3k/s logs to Loki
  3. Add Loki datasource to Grafana (v8.1.1)
  4. Use the explore option in Grafana, with Loki's datasource and query: {pod_name=~"vector-load-tests.*"} (filter will show 3 pod logs)
  5. Add query options: last 1h , 1000 line limit, refresh 5s. The memory consumption will stabilise after a while. After some time stop querying and start querying again with the options: last 1h , 100 line limit, refresh 5s.

Expected behavior Memory usage should be in-line with the read throughput or (relatively) stable if the throughput doesn't increase from previous consumption.

Environment:

  • Infrastructure: Kubernetes
  • Loki config:
auth_enabled: true

server:
  http_listen_port: 3100
  grpc_listen_port: 9095
  grpc_server_max_recv_msg_size: 52428800
  grpc_server_max_send_msg_size: 52428800

distributor:
  ring:
    kvstore:
      store: memberlist

memberlist:
  bind_port: 7946
  join_members:
    - loki-memberlist

ingester:
  lifecycler:
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
  chunk_idle_period: 1m
  chunk_block_size: 10485760
  chunk_encoding: snappy
  chunk_retain_period: 0s
  max_chunk_age: 2h
  max_transfer_retries: 0
  wal:
    enabled: true
    dir: /var/loki/wal
    replay_memory_ceiling: 1GB

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  max_cache_freshness_per_query: 10m
  ingestion_rate_mb: 10

schema_config:
  configs:
    - from: 2021-05-01
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: loki_index_
        period: 24h

storage_config:
  boltdb_shipper:
    shared_store: s3
    active_index_directory: /var/loki/index
    cache_location: /var/loki/cache
    cache_ttl: 10m
  aws:
    bucketnames: <bucket_name>
    endpoint: s3.eu-west-1.amazonaws.com
    region: eu-west-1
    access_key_id: <access_key_id>
    secret_access_key: <secret_access_key>
    insecure: false
    sse_encryption: false
    http_config:
      idle_conn_timeout: 90s
      response_header_timeout: 0s
      insecure_skip_verify: false
    s3forcepathstyle: true      

querier:
  engine:
    max_look_back_period: 168h

table_manager:
  retention_deletes_enabled: true
  retention_period: 168h

query_range:
  align_queries_with_step: true
  max_retries: 5
  split_queries_by_interval: 15m
  cache_results: false
  results_cache:
    cache:
      enable_fifocache: false
      fifocache:
        max_size_items: 1
        validity: 1m

compactor:
  working_directory: /var/loki/compactor
  shared_store: filesystem

ruler:
  storage:
    type: local
    local:
      directory: /var/loki/ruler
  • GOGC=5
  • Log volume: 3k/s (k8s logs <- Vector -> Kafka <- Vector -> Loki)

Screenshots, Promtail config, or terminal output Metrics for the Loki process after running some tests:

  • k8s memory usage Screenshot 2022-01-24 at 15 35 08
  • go memory usage Screenshot 2022-01-24 at 15 39 42

twix14 avatar Jan 24 '22 15:01 twix14

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

stale[bot] avatar Mar 02 '22 14:03 stale[bot]

keepalive

dschunack avatar Mar 03 '22 13:03 dschunack

I have the same issue !! , i'm using simple scalable config (docker container), with nginx gateway. And syslog-ng is sending logs to promtail .

The bug : loki crushed for large queries, for example (last 6 hours in time range) !!

Loki-config.yml :

---
server:
  http_listen_port: 3100
  http_server_read_timeout: 10m
  http_server_write_timeout: 10m
  http_server_idle_timeout: 10m
  log_level: debug
#  grpc_server_max_recv_msg_size: 52428800
#  grpc_server_max_send_msg_size: 52428800

distributor:
  ring:
    kvstore:
      store: memberlist
memberlist:
  join_members:
    - loki:7946

limits_config:
  per_stream_rate_limit: 0
  max_query_parallelism: 50  # Maximum number of queries that will be scheduled in parallel by the frontend.
query_range:
  split_queries_by_interval: 10m
  parallelise_shardable_queries: true
  cache_results: true
  results_cache:
    cache:
      enable_fifocache: true
      fifocache:
        max_size_items: 1024
        validity: 24h

querier:
  max_concurrent: 25

ingester:
  lifecycler:
#    address: 127.0.0.1
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
    final_sleep: 0s
  chunk_target_size: 1536000
  chunk_idle_period: 30m
  chunk_retain_period: 1m # How long chunks should be retained in-memory after they've been flushed.
  chunk_block_size: 262144
  chunk_encoding: snappy
  max_chunk_age: 1h
  #max_transfer_retries: 3 # Number of times to try and transfer chunks when leaving before

schema_config:
  configs:
    - from: 2021-08-01
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: index_
        period: 24h
common:
  path_prefix: /loki
#  replication_factor: 1
  storage:
    s3:
      endpoint: minio:9000
      insecure: true
      bucketnames: loki-data
      access_key_id: loki
      secret_access_key: loki
      s3forcepathstyle: true
#  ring:
#    kvstore:
#      store: memberlist
ruler:
  storage:
    s3:
      bucketnames: loki-ruler

#frontend:
#  log_queries_longer_than: 5s
#  # downstream_url: http://loki-1:3100
#  downstream_url: http://gateway:3100
#  compress_responses: true

@community, could you help on this issue pls ?

saibug avatar Mar 22 '22 17:03 saibug

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

stale[bot] avatar Apr 25 '22 07:04 stale[bot]

keepalive

parera10 avatar Jun 29 '22 07:06 parera10

Hi. I am facing the same issue. with Loki version 2.6.1 any updates?

LinTechSo avatar Aug 10 '22 11:08 LinTechSo

Had this problem, root cause was that compactor was repeatedly failing because it didn't have permission to delete objects in the s3 bucket storage, adding DeleteObject permissions fixed it.

ckw017 avatar Nov 30 '22 20:11 ckw017

I am facing the same issue, with Loki version 2.6.1 go_routines image k8s memory image

ranryl avatar Dec 05 '22 08:12 ranryl

@ranryl did you try to upgrade loki version ?

saibug avatar Dec 07 '22 09:12 saibug

@ranryl did you try to upgrade loki version ?

Now, upgrade to 2.7.2,not yet resovled。changed the querier pull mode,I continue to observe it

ranryl avatar Feb 09 '23 08:02 ranryl

goroutine profile: total 479556
348621 @ 0x43ccd6 0x44dc7e 0x44dc55 0x46a285 0x1ba3293 0x1ba3269 0x1ba63cb 0x1ba1c0f 0x199234f 0x1ba1846 0x1ba9150 0xba72da 0x1ba90a5 0x133fea9 0x1340f5f 0x133f4ad 0x1379899 0x1378ac5 0x1378805 0x13786c9 0x46e481
#	0x46a284	sync.runtime_SemacquireMutex+0x24											/usr/local/go/src/runtime/sema.go:77
#	0x1ba3292	sync.(*RWMutex).RLock+0x72												/usr/local/go/src/sync/rwmutex.go:71
#	0x1ba3268	github.com/grafana/loki/pkg/storage/stores/shipper/index.(*Table).ForEach+0x48						/src/loki/pkg/storage/stores/shipper/index/table.go:176
#	0x1ba63ca	github.com/grafana/loki/pkg/storage/stores/shipper/index.(*TableManager).ForEach+0x4a					/src/loki/pkg/storage/stores/shipper/index/table_manager.go:109
#	0x1ba1c0e	github.com/grafana/loki/pkg/storage/stores/shipper/index.(*querier).QueryPages.func1+0x1ae				/src/loki/pkg/storage/stores/shipper/index/querier.go:49
#	0x199234e	github.com/grafana/loki/pkg/storage/stores/shipper/util.DoParallelQueries+0x16e						/src/loki/pkg/storage/stores/shipper/util/queries.go:39
#	0x1ba1845	github.com/grafana/loki/pkg/storage/stores/shipper/index.(*querier).QueryPages+0x305					/src/loki/pkg/storage/stores/shipper/index/querier.go:46
#	0x1ba914f	github.com/grafana/loki/pkg/storage/stores/shipper.(*indexClient).QueryPages.func1+0x4f					/src/loki/pkg/storage/stores/shipper/shipper_index_client.go:165
#	0xba72d9	github.com/weaveworks/common/instrument.CollectedRequest+0x279								/src/loki/vendor/github.com/weaveworks/common/instrument/instrument.go:167
#	0x1ba90a4	github.com/grafana/loki/pkg/storage/stores/shipper.(*indexClient).QueryPages+0x124					/src/loki/pkg/storage/stores/shipper/shipper_index_client.go:164
#	0x133fea8	github.com/grafana/loki/pkg/storage/stores/series/index.(*cachingIndexClient).queryPages+0x968				/src/loki/pkg/storage/stores/series/index/caching_index_client.go:176
#	0x1340f5e	github.com/grafana/loki/pkg/storage/stores/series/index.(*cachingIndexClient).doBroadQueries+0x7e			/src/loki/pkg/storage/stores/series/index/caching_index_client.go:234
#	0x133f4ac	github.com/grafana/loki/pkg/storage/stores/series/index.(*cachingIndexClient).QueryPages+0x8c				/src/loki/pkg/storage/stores/series/index/caching_index_client.go:103
#	0x1379898	github.com/grafana/loki/pkg/storage/stores/series.(*indexReaderWriter).lookupEntriesByQueries+0x178			/src/loki/pkg/storage/stores/series/series_index_store.go:568
#	0x1378ac4	github.com/grafana/loki/pkg/storage/stores/series.(*indexReaderWriter).lookupIdsByMetricNameMatcher+0x1c4		/src/loki/pkg/storage/stores/series/series_index_store.go:490
#	0x1378804	github.com/grafana/loki/pkg/storage/stores/series.(*indexReaderWriter).lookupSeriesByMetricNameMatcher+0x84		/src/loki/pkg/storage/stores/series/series_index_store.go:464
#	0x13786c8	github.com/grafana/loki/pkg/storage/stores/series.(*indexReaderWriter).lookupSeriesByMetricNameMatchers.func1+0x88	/src/loki/pkg/storage/stores/series/series_index_store.go:409

130485 @ 0x43ccd6 0x44cbbc 0x1377e29 0x13737e5 0x11e15cd 0xba72da 0x11e149a 0x136b6f2 0x136a4c8 0x136af86 0x136a37e 0x1c07fb0 0xf1ec78 0x1c72924 0xb1469a 0xbcc905 0x1d46ee7 0xb1469a 0xbcce22 0xb1469a 0xb17b42 0xb1469a 0xbce01b 0xb1469a 0xb1453e 0xf1eb38 0xacc54f 0xad0b8f 0xaca0b8 0x46e481
#	0x1377e28	github.com/grafana/loki/pkg/storage/stores/series.(*indexReaderWriter).lookupSeriesByMetricNameMatchers+0x728	/src/loki/pkg/storage/stores/series/series_index_store.go:426
#	0x13737e4	github.com/grafana/loki/pkg/storage/stores/series.(*indexReaderWriter).GetChunkRefs+0x4e4			/src/loki/pkg/storage/stores/series/series_index_store.go:160
#	0x11e15cc	github.com/grafana/loki/pkg/storage/stores/index.monitoredReaderWriter.GetChunkRefs.func1+0x6c			/src/loki/pkg/storage/stores/index/index.go:54
#	0xba72d9	github.com/weaveworks/common/instrument.CollectedRequest+0x279							/src/loki/vendor/github.com/weaveworks/common/instrument/instrument.go:167
#	0x11e1499	github.com/grafana/loki/pkg/storage/stores/index.monitoredReaderWriter.GetChunkRefs+0x1d9			/src/loki/pkg/storage/stores/index/index.go:52
#	0x136b6f1	github.com/grafana/loki/pkg/storage/stores.(*storeEntry).GetChunkRefs+0x631					/src/loki/pkg/storage/stores/composite_store_entry.go:67
#	0x136a4c7	github.com/grafana/loki/pkg/storage/stores.compositeStore.GetChunkRefs.func1+0xa7				/src/loki/pkg/storage/stores/composite_store.go:149
#	0x136af85	github.com/grafana/loki/pkg/storage/stores.compositeStore.forStores+0x265					/src/loki/pkg/storage/stores/composite_store.go:241
#	0x136a37d	github.com/grafana/loki/pkg/storage/stores.compositeStore.GetChunkRefs+0xfd					/src/loki/pkg/storage/stores/composite_store.go:148
#	0x1c07faf	github.com/grafana/loki/pkg/ingester.(*Ingester).GetChunkIDs+0x18f						/src/loki/pkg/ingester/ingester.go:800
#	0xf1ec77	github.com/grafana/loki/pkg/logproto._Querier_GetChunkIDs_Handler.func1+0x77					/src/loki/pkg/logproto/logproto.pb.go:5095
#	0x1c72923	github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1+0xc3				/src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:33
#	0xb14699	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39					/src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xbcc904	github.com/weaveworks/common/middleware.ServerUserHeaderInterceptor+0x64					/src/loki/vendor/github.com/weaveworks/common/middleware/grpc_auth.go:38
#	0x1d46ee6	github.com/grafana/loki/pkg/util/fakeauth.SetupAuthMiddleware.func1+0xa6					/src/loki/pkg/util/fakeauth/fake_auth.go:27
#	0xb14699	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39					/src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xbcce21	github.com/weaveworks/common/middleware.UnaryServerInstrumentInterceptor.func1+0xa1				/src/loki/vendor/github.com/weaveworks/common/middleware/grpc_instrumentation.go:35
#	0xb14699	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39					/src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xb17b41	github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1+0x401					/src/loki/vendor/github.com/opentracing-contrib/go-grpc/server.go:57
#	0xb14699	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39					/src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xbce01a	github.com/weaveworks/common/middleware.GRPCServerLog.UnaryServerInterceptor+0xba				/src/loki/vendor/github.com/weaveworks/common/middleware/grpc_logging.go:30
#	0xb14699	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39					/src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xb1453d	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0xbd					/src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
#	0xf1eb37	github.com/grafana/loki/pkg/logproto._Querier_GetChunkIDs_Handler+0x137						/src/loki/pkg/logproto/logproto.pb.go:5097
#	0xacc54e	google.golang.org/grpc.(*Server).processUnaryRPC+0xcce								/src/loki/vendor/google.golang.org/grpc/server.go:1282
#	0xad0b8e	google.golang.org/grpc.(*Server).handleStream+0xa2e								/src/loki/vendor/google.golang.org/grpc/server.go:1619
#	0xaca0b7	google.golang.org/grpc.(*Server).serveStreams.func1.2+0x97							/src/loki/vendor/google.golang.org/grpc/server.go:921

ranryl avatar Feb 09 '23 08:02 ranryl

Same issue at 2.7.5 My setup is single node in k8s

alexey107 avatar Jun 19 '23 15:06 alexey107