Add "retention" feature allowing idle metrics to expire
As proposed in #20 I implemented a way to auto remove idle metrics at runtime without needing to restart. Example:
<metric>
name message_foo_counter
type counter
desc The total number of foo in message.
key foo
retention 3600 # 1h
retention_check_interval 1800 # 30m
<labels>
bar ${bar}
</labels>
</metric>
If ${bar} was baz one time but after that no records with that value were processed, then after one hour the metric
foo{bar="baz"} might be removed.
When this actually happens depends on retention_check_interval (default 60).
It causes a background thread to check every 30 minutes for expired metrics.
So worst case the metrics are removed 30 minutes after expiration.
The naming of the config keys were shamelessly ~~stolen from~~ inspired by grok_exporter to make this feature more familiar to people using the grok_exporter.
Additional to the implementation I had to refactor the Metrics class to directly implement instrument(record, expander) and put subclass-specific logic into value(record), set_value?(value) and set_value(value, labels).
That reduces code duplication and was necessary for not introducing further duplicates.
I also had to introduce a new data store based on the default data store of prometheus/client_ruby to allow for removal of elements.
The last thing I want to mention is that I need to use the thread helper to start cleaning expired metrics in the background. I first tried to use the timer helper but it caused a test to go into an infinite loop.
If you need more tests or need other alterations to the code please let me know. I am looking forward to your feedback 🙂
Sorry I had two very busy weeks. I hope I get to it tomorrow.
@phihos any update?
@phihos is this still on-going?
Hey @phihos 👋🏼 wondering if you still have cycles to work on this? I can try and take over though not as experienced with Ruby.
Looks like there was mainly a concern whether a dedicated thread is needed, instead of running it in the plugin thread.
Wonder if you had the bandwidth to pick this up again @phihos
:pray: