flowpipeline export/prometheus: metrics growing indefinitly, causing timeouts

export/prometheus: metrics growing indefinitly, causing timeouts

Open 9er opened this issue 1 year ago • 0 comments

Example

- segment: prometheus
  # collect and export peering statistics
  config:
    endpoint: ":8080"
    labels: "SrcAS,DstAS,SrcIfDesc,DstIfDesc,SamplerAddress,FlowDirection"

Problem

The prometheus segment will add every combination of labels it has seen to its promethues exporter. Over time, the cardinality will grow large enough that curl/prometheus jobs will at some point run into timeouts.

In one case we have running in production, this happens every few weeks. Currently, flowpipeline can simply be restarted to get rid of "old" label sets.

Possible Improvement

Vacuum/GC/evict label sets that haven't been touched for a while (probably a stupid idea, implementation wise) or vacuum/GC/evict all label sets every once in a while (probably pretty easy to implement).

This could be made configurable, maybe something like:

  config:
    vacuum_interval: 24h

Maybe also during a certain time of the day would be an option, but prometheus detecting counter resets should be making the vacuum hitless anyway.

Jun 29 '24 12:06 9er

flowpipeline flowpipeline copied to clipboard

export/prometheus: metrics growing indefinitly, causing timeouts

Example

Problem

Possible Improvement

flowpipeline
flowpipeline copied to clipboard