flowpipeline icon indicating copy to clipboard operation
flowpipeline copied to clipboard

export/prometheus: metrics growing indefinitly, causing timeouts

Open 9er opened this issue 1 year ago • 0 comments

Example

- segment: prometheus
  # collect and export peering statistics
  config:
    endpoint: ":8080"
    labels: "SrcAS,DstAS,SrcIfDesc,DstIfDesc,SamplerAddress,FlowDirection"

Problem

The prometheus segment will add every combination of labels it has seen to its promethues exporter. Over time, the cardinality will grow large enough that curl/prometheus jobs will at some point run into timeouts.

In one case we have running in production, this happens every few weeks. Currently, flowpipeline can simply be restarted to get rid of "old" label sets.

Possible Improvement

Vacuum/GC/evict label sets that haven't been touched for a while (probably a stupid idea, implementation wise) or vacuum/GC/evict all label sets every once in a while (probably pretty easy to implement).

This could be made configurable, maybe something like:

  config:
    vacuum_interval: 24h

Maybe also during a certain time of the day would be an option, but prometheus detecting counter resets should be making the vacuum hitless anyway.

9er avatar Jun 29 '24 12:06 9er