vector icon indicating copy to clipboard operation
vector copied to clipboard

Vector buffer metrics report negative values

Open namm2 opened this issue 1 year ago • 3 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I notice there are 50s (per 200ish total) of Vector Agent pods report negative values on the vector_buffer_events and vector_buffer_bytes_size. This issue only appears on one specific environment (GKE) and aren't seen on any other GKE clusters, which is confusing and also it'd cause the misfiring alerts because the value is le 0.

Configuration

Vector Agent configs of the sink:

  sinks:
    vector-aggregator:
      type: vector
      buffer:
        max_events: 50000

Version

0.39.0

Debug Output

No response

Example Data

# HELP vector_buffer_byte_size buffer_byte_size
# TYPE vector_buffer_byte_size gauge
...
vector_buffer_byte_size{buffer_type="memory",component_id="vector-aggregator",component_kind="sink",component_type="vector",host="vector-agent-zr6d8",stage="0"} -149552553 1730792799226
# HELP vector_buffer_events buffer_events
# TYPE vector_buffer_events gauge
...
vector_buffer_events{buffer_type="memory",component_id="vector-aggregator",component_kind="sink",component_type="vector",host="vector-agent-zr6d8",stage="0"} -49014 1730792799226

Additional Context

No response

References

No response

namm2 avatar Nov 05 '24 08:11 namm2

Update: The issue disappears when all the Vector Agent pods were restarted. I still don't know why the buffer metrics were negative in the first place.

namm2 avatar Nov 08 '24 09:11 namm2

Hi @namm2, thank you for reporting this bug.

This sounds like a difficult issue to triage. Do you have any more details share? Things like full configuration (sensitive data redacted), environment details, load, throughput etc.

pront avatar Nov 08 '24 20:11 pront

Also occurs on 0.43.1, AWS EKS, after moving from 0.37.1.

mscanlon72 avatar Jun 12 '25 22:06 mscanlon72