Vector buffer metrics report negative values
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
I notice there are 50s (per 200ish total) of Vector Agent pods report negative values on the vector_buffer_events and vector_buffer_bytes_size. This issue only appears on one specific environment (GKE) and aren't seen on any other GKE clusters, which is confusing and also it'd cause the misfiring alerts because the value is le 0.
Configuration
Vector Agent configs of the sink:
sinks:
vector-aggregator:
type: vector
buffer:
max_events: 50000
Version
0.39.0
Debug Output
No response
Example Data
# HELP vector_buffer_byte_size buffer_byte_size
# TYPE vector_buffer_byte_size gauge
...
vector_buffer_byte_size{buffer_type="memory",component_id="vector-aggregator",component_kind="sink",component_type="vector",host="vector-agent-zr6d8",stage="0"} -149552553 1730792799226
# HELP vector_buffer_events buffer_events
# TYPE vector_buffer_events gauge
...
vector_buffer_events{buffer_type="memory",component_id="vector-aggregator",component_kind="sink",component_type="vector",host="vector-agent-zr6d8",stage="0"} -49014 1730792799226
Additional Context
No response
References
No response
Update: The issue disappears when all the Vector Agent pods were restarted. I still don't know why the buffer metrics were negative in the first place.
Hi @namm2, thank you for reporting this bug.
This sounds like a difficult issue to triage. Do you have any more details share? Things like full configuration (sensitive data redacted), environment details, load, throughput etc.
Also occurs on 0.43.1, AWS EKS, after moving from 0.37.1.