fluent-plugin-prometheus icon indicating copy to clipboard operation
fluent-plugin-prometheus copied to clipboard

Add "retention" feature allowing idle metrics to expire

Open phihos opened this issue 3 years ago • 9 comments

As proposed in #20 I implemented a way to auto remove idle metrics at runtime without needing to restart. Example:

<metric>
  name message_foo_counter
  type counter
  desc The total number of foo in message.
  key foo
  retention 3600 # 1h
  retention_check_interval 1800 # 30m
  <labels>
    bar ${bar}
  </labels>
</metric>

If ${bar} was baz one time but after that no records with that value were processed, then after one hour the metric foo{bar="baz"} might be removed. When this actually happens depends on retention_check_interval (default 60). It causes a background thread to check every 30 minutes for expired metrics. So worst case the metrics are removed 30 minutes after expiration.

The naming of the config keys were shamelessly ~~stolen from~~ inspired by grok_exporter to make this feature more familiar to people using the grok_exporter.

Additional to the implementation I had to refactor the Metrics class to directly implement instrument(record, expander) and put subclass-specific logic into value(record), set_value?(value) and set_value(value, labels). That reduces code duplication and was necessary for not introducing further duplicates.

I also had to introduce a new data store based on the default data store of prometheus/client_ruby to allow for removal of elements.

The last thing I want to mention is that I need to use the thread helper to start cleaning expired metrics in the background. I first tried to use the timer helper but it caused a test to go into an infinite loop.

If you need more tests or need other alterations to the code please let me know. I am looking forward to your feedback 🙂

phihos avatar Jul 31 '22 22:07 phihos

Sorry I had two very busy weeks. I hope I get to it tomorrow.

phihos avatar Nov 04 '22 20:11 phihos

@phihos any update?

gromnsk avatar Dec 21 '22 16:12 gromnsk

@phihos is this still on-going?

AlbusLumos avatar Feb 22 '23 06:02 AlbusLumos

Hey @phihos 👋🏼 wondering if you still have cycles to work on this? I can try and take over though not as experienced with Ruby.

Looks like there was mainly a concern whether a dedicated thread is needed, instead of running it in the plugin thread.

dkulchinsky avatar Jun 07 '23 18:06 dkulchinsky

Wonder if you had the bandwidth to pick this up again @phihos

:pray:

Lusitaniae avatar Feb 12 '24 04:02 Lusitaniae