prometheus-nats-exporter
prometheus-nats-exporter copied to clipboard
nats_stream_total_messages grows, but no other metric follows it
Hello, I was tracking nats_stream_total_messages metric for one stream, which usually grows and falls together with nats_consumer_num_pending metric in our system.
Sometimes it happens that nats_stream_total_messages grows a lot (until it fills up the whole stream), but no other metric I could find (except nats_stream_total_bytes) shows similar growth.
I've tried summing all of the metrics I found that seem to show number of messages for a stream:
nats_consumer_num_pendingnats_consumer_num_ack_pendingnats_consumer_num_redeliverednats_consumer_num_waiting
but the sum of all these for one stream is below 1k, while nats_stream_total_messages is around 100k.
Is there a metric that i'm missing that could explain what these 100k messages are? This spike seems to only be observable in our system through nats_stream_total_messages and nats_stream_total_bytes.
Any ideas what I'm missing and how i can find out how to track these messages?
Thanks a lot for your help!
hey @hpdobrica, let me know if this issue still persists, then I will investigate futher. Thanks.
Hello, I am not sure is it related to this issue or not but I also see strange correlation between nats_stream_total_bytes, nats_stream_total_messages and nats_stream_last_seq. I am using default Grafana dashboard (https://github.com/nats-io/prometheus-nats-exporter/blob/main/walkthrough/grafana-jetstream-dash-helm.json, the graphics a bit re-arranged but the metrics the same) and here is what I see:
The load and number of messages sent towards NATS and to the following streams is the same:
- TPOSS_LOCK_MGR_gx_lock
- TPOSS_LOCK_MGR_gy_lock
And it is visible that messages are constantly goes although 5-6 minutes nats_stream_total_bytes and nats_stream_total_messages for these streams goes to down, although I expect it to be somewhat similar to the beginning of the graphics as the rate of messages (at least based on the nats_stream_last_seq) is the same.
NATS version is - nats:2.10.21-alpine
Is there any ideas what it could be?
We will take a look into this.