quickwit
quickwit copied to clipboard
Prometheus metrics for ingest wal usage are not working
Describe the bug
Both quickwit_ingest_wal_disk_used_bytes and quickwit_ingest_wal_memory_used_bytes are not working as expected. quickwit_ingest_wal_disk_used_bytes displays a constant value of 134217728 (max queue disk usage is set to 32GB and total size of disk is 250G). quickwit_ingest_wal_memory_used_bytes always report 0.
Steps to reproduce (if applicable) I'm using default Prometheus scraping configuration provided by the helm chart. These are my ingest_api values:
ingest_api:
max_queue_memory_usage: 4GiB
max_queue_disk_usage: 32GiB
Expected behavior I expect both metrics to report WAL usage for both disk and memory.
Would also be great to have metrics to show max_queue_disk_usage and max_queue_mem_usage config setting.
Configuration: Version: v0.8.1
node.yaml:
data_dir: /quickwit/qwdata
default_index_root_uri: s3://prod-<redacted>-quickwit/indexes
gossip_listen_port: 7282
grpc:
max_message_size: 80 MiB
indexer:
enable_otlp_endpoint: true
ingest_api:
max_queue_disk_usage: 32GiB
max_queue_memory_usage: 4GiB
listen_address: 0.0.0.0
metastore:
postgres:
acquire_connection_timeout: 30s
idle_connection_timeout: 1h
max_connection_lifetime: 1d
max_connections: 50
min_connections: 10
storage:
s3:
region: us-east-1
version: 0.8
how are you ingesting data? The wal is only used when ingesting from the api, not when pulling from something like kafka or kinesis. So if you don't use the ingest api (or are not currently ingesting anything), it's expected that the in-memory wal will be empty. The on disk wal also records queues it thinks exists, and always works in block of 128MiB, so it's never going to really be empty.
I'm ingesting using both the ingest API and OTLP (no Kafka or Kinesis). This is related to https://github.com/quickwit-oss/quickwit/issues/5548
(I put the comment on a different issue by mistake) I suspect the metric is only plugged in for ingest v2 and @fredsig is using ingest v1
Thanks @fulmicoton, in the meantime, I've created my own wal watcher to give me stats for the queue directory on ingest v1, I just do a du -h through a pod exec (ugly but works for now).