client_ruby
client_ruby copied to clipboard
Counter metric is reset when it has no labels
Hello,
I implemented a custom MongoDB metric data store. While testing it, I noticed that counters with no labels are reset every time I relaunch my application. This does not happen to labeled counters.
After inspecting the code, it looks like this line resets the counters: https://github.com/prometheus/client_ruby/blob/8740786a4b46f46c1c3ad893a699c3ce51d468b9/lib/prometheus/client/metric.rb#L46
From top of my head I can't think of a scenario when this behaviour would be useful for counters or other metric types. Also, based on the tests, store
should handle nil
cases and initialise values to 0.0
before incrementing:
https://github.com/prometheus/client_ruby/blob/8740786a4b46f46c1c3ad893a699c3ce51d468b9/spec/examples/data_store_example.rb#L33
Would it be safe to remove it?
Thank you, Alex
Update
After further investigation, I see the reason why it was done: https://github.com/prometheus/client_ruby/commit/feccff3429cac056913ad7414b8744e61db57d85
I think it makes sense, and maybe it would be nice to present default values for labeled metrics too!
Though, I think, setting default values during the initialisation step, break persistent metric storages. What if instead of setting default values during metric creation, we return default values during presentation. It could work like this:
- Prometheus hits
/metrics
endpoint - Exporter builds default values from metrics registry
- Exporter pulls metrics from the data store using
all_values
method - Exporter updates default values with values pulled from data store
- Marshal formats text output and presents it to Prometheus.
This approach would solve missing values issue as well as make persistent storages work?
🤔
Hello Alex,
The reason we initialize metrics that don’t have labels is because it’s recommended by Prometheus’s best practices
There SHOULD be a way to initialize a given Child with the default value, usually just calling labels(). Metrics without labels MUST always be initialized to avoid problems with missing metrics.
maybe it would be nice to present default values for labeled metrics too!
The problem here is that while the library knows what the labels will be for your metric (you specify these when declaring it),we can’t know what values those labels may have. As a consumer of the library, however, you might know what the values will be in advance (depends on the metric). If you do, it is recommended that you set those using the init_label_set method to define them.
if you do that, all your metrics will be constantly reported,instead of appearing and disappearing after restarts.
counters with no labels are reset every time I relaunch my application. setting default values during the initialisation step, break persistent metric storages.
This is by design, and intended. Prometheus metrics should reset after an application restart, and Prometheus is well equipped to deal with that, this is expected.
We do not support “persistent” metric storage. To the extent that your metrics are maintained after a restart, that’s an unfortunate side effect of our internal storage method, which happens to use files on disk to get around multi-process communication.
We would reset these files on restart if we could, but we haven’t figured out how to do so safely, in a way that’d support all possible use cases. Instead, we recommend that you clean these files up yourself on app start, and we provide some sample code on how to do this in our README.
I’m assuming you’re using DirectFileStore, since you mention persistent metrics.
Thank you for your response Daniel! This explains a lot about the client and it's intended use.
What do you think would be the best approach for the following case. We host an app in Heroku with the following setup:
- 1x web dyno
- 3x worker dynos (all workerss work on the same task, and have the same metrics)
We need to collect worker dynos' metrics, and since workers are separate virtual machines, they can not share metrics in files. We have two options here:
- Use pushgateway
- Implement centrilised metric store which is updated by the workers and exposed by the web dyno via
/metrics
endpoint
We implemented the first option and it works well. At the moment we are exploring the second one. We implemented a custom datastore to store metrics in MongoDB. The problem we are facing now is whenever we scale our workers, a new dyno is launched and metrics are reset, therefore we could loose metrics between Prometheus collections:
worker records worker records
metrics metrics
| | | | | |
|-----------------|-------------------|
first heroku second
prometheus scales and prometheus
scrape resets metrics scrape
(-----------------)
these metrics
will be
lost
We have the following options:
- As mentioned before, present Prometheus with dafault metric values when it collects metrics from
/metrics
endpoint (do not reset metric, when metric object is initialised) - Add
host_id
+process_id
labels to the metric, and aggregate values inall_values
datastore method (the problem with this approach is that we don't know when dynos are cycled/scaled and when newhost_id
+process_id
pairs are introduced and old ones are retired, therefore we don't know whichhost_id
+process_id
metrics to include inall_values
aggregation)
Thank you, for your thoughts!
Hi @alex-wearisma, sorry for the silence.
Truth be told, I've skimmed over your situation a few times and I've not had any amazing ideas.
Running Prometheus in environments where you can't indivdually make HTTP requests to each process that you want to scrape metrics from is always a pain. DirectFileStore
solves the case where those processes are all forks of a single parent, but in many ways that's one of the easier cases to solve (not to diminish the hard work @dmagliola did to make that work!).
I've not deployed anything beyond a hobby app on Heroku, so correct me if my understanding is wrong (it has been pieced together from a few searches). My understanding is:
- It's possible to talk to individual dynos, though getting their IPs is a pain (seems like you'd need to do service discovery by running the
heroku
CLI to get the IPs and then feed them into Prometheus's file-based service discovery by writing out a file periodically) - The same is not true for workers, which aren't addressable over the public internet (less sure if this is true)
Assuming you want to continue with Prometheus, I think this leaves you in heavily custom territory. What I'm about to suggest is very much in "here's what I might try" territory, rather than something I'm sure will work well.
I think your second option might be workable, but you'd probably need to add a per-host/process heartbeat into the storage, and only return metrics for that host/process if its heartbeat has been updated within some period (e.g. the last 5m). You'd want to reserve a label for that purpose which you know nothing else will use.
The other thing you'll need to think about (assuming these are being saved to durable storage) is cleaning out any metrics from old processes that haven't sent a heartbeat in a really long time (e.g. 12-24h).
That's probably not as simple an answer as you were looking for, and it might be too late (I assume you've figured something out in the meantime), but it's all I could really think of after a few rounds of reading through this thread.
Hello @Sinjo . Thank you for your thoughts! And thank you @Sinjo and @dmagliola for your help!
No problem @alex-wearisma. I'm going to close this issue since it's not something we'll be working on in the core library.
You may be interested in reading this discussion that was opened last week. It's not exactly the same use-case as what you're doing, but you might find some inspiration in what @KaoruDev is working on (assuming it ends up being open source).
Thank you @Sinjo !