quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Prometheus metrics for ingest wal usage are not working

Open fredsig opened this issue 1 year ago • 4 comments

Describe the bug

Both quickwit_ingest_wal_disk_used_bytes and quickwit_ingest_wal_memory_used_bytes are not working as expected. quickwit_ingest_wal_disk_used_bytes displays a constant value of 134217728 (max queue disk usage is set to 32GB and total size of disk is 250G). quickwit_ingest_wal_memory_used_bytes always report 0.

Steps to reproduce (if applicable) I'm using default Prometheus scraping configuration provided by the helm chart. These are my ingest_api values:

  ingest_api:
    max_queue_memory_usage: 4GiB
    max_queue_disk_usage: 32GiB

Expected behavior I expect both metrics to report WAL usage for both disk and memory.

Would also be great to have metrics to show max_queue_disk_usage and max_queue_mem_usage config setting.

Configuration: Version: v0.8.1

node.yaml:

data_dir: /quickwit/qwdata
default_index_root_uri: s3://prod-<redacted>-quickwit/indexes
gossip_listen_port: 7282
grpc:
  max_message_size: 80 MiB
indexer:
  enable_otlp_endpoint: true
ingest_api:
  max_queue_disk_usage: 32GiB
  max_queue_memory_usage: 4GiB
listen_address: 0.0.0.0
metastore:
  postgres:
    acquire_connection_timeout: 30s
    idle_connection_timeout: 1h
    max_connection_lifetime: 1d
    max_connections: 50
    min_connections: 10
storage:
  s3:
    region: us-east-1
version: 0.8

fredsig avatar Nov 14 '24 11:11 fredsig

how are you ingesting data? The wal is only used when ingesting from the api, not when pulling from something like kafka or kinesis. So if you don't use the ingest api (or are not currently ingesting anything), it's expected that the in-memory wal will be empty. The on disk wal also records queues it thinks exists, and always works in block of 128MiB, so it's never going to really be empty.

trinity-1686a avatar Nov 14 '24 16:11 trinity-1686a

I'm ingesting using both the ingest API and OTLP (no Kafka or Kinesis). This is related to https://github.com/quickwit-oss/quickwit/issues/5548

fredsig avatar Nov 14 '24 18:11 fredsig

(I put the comment on a different issue by mistake) I suspect the metric is only plugged in for ingest v2 and @fredsig is using ingest v1

fulmicoton avatar Nov 18 '24 08:11 fulmicoton

Thanks @fulmicoton, in the meantime, I've created my own wal watcher to give me stats for the queue directory on ingest v1, I just do a du -h through a pod exec (ugly but works for now).

Screenshot 2024-11-18 at 09 12 30

fredsig avatar Nov 18 '24 09:11 fredsig