flowpipeline
flowpipeline copied to clipboard
export/prometheus: metrics growing indefinitly, causing timeouts
Example
- segment: prometheus
# collect and export peering statistics
config:
endpoint: ":8080"
labels: "SrcAS,DstAS,SrcIfDesc,DstIfDesc,SamplerAddress,FlowDirection"
Problem
The prometheus segment will add every combination of labels it has seen to its promethues exporter. Over time, the cardinality will grow large enough that curl/prometheus jobs will at some point run into timeouts.
In one case we have running in production, this happens every few weeks. Currently, flowpipeline can simply be restarted to get rid of "old" label sets.
Possible Improvement
Vacuum/GC/evict label sets that haven't been touched for a while (probably a stupid idea, implementation wise) or vacuum/GC/evict all label sets every once in a while (probably pretty easy to implement).
This could be made configurable, maybe something like:
config:
vacuum_interval: 24h
Maybe also during a certain time of the day would be an option, but prometheus detecting counter resets should be making the vacuum hitless anyway.