client_ruby
client_ruby copied to clipboard
Supporting OpenMetrics
We should support OpenMetrics. I've started this work in a temporary branch, will start opening PRs. But this seems to be the master list of things to do.
This is a breaking change that we should release as 3.0. We should have previous minor versions with deprecation warnings about the things we'll be changing.
Todo
-
[ ] Basic Formatting hygiene
- [ ] Format negotiation with OpenMetrics Accept Header
- [ ] Add tests to the OpenMetrics formatter I wrote
- [ ] #EOF
- [ ] #UNIT including validation
- [ ] forcing
_total
on counters. This is a breaking change, released in 3 stages- [ ] Warn on counters that are not suffixed with
_total
- [ ] (next major version) Forcefully add
_total
to counters. Those that come with_total
suffix, remove it and warn, in preparation for next major. - [ ] (next major version) Error on counters suffixed
_total
- [ ] Warn on counters that are not suffixed with
-
[ ] Timestamps Output
- [x] Merge PR #172 which will help with outputting "updated_at" timestamps for time series
- [ ]
series_created
series
-
[ ] Exemplars
-
[ ] New Metric types
- [ ] StateSet
- [ ] Info
-
[ ] Document breaking changes (UPGRADING.md)
- [ ] If you called
Metric#values
orStores#all_values
, you'll get objects with timestamps instead of simply float - [ ] Naming convention things: You must pass the unit and not include it in the name. You must not add _total to your counters. We'll add both automatically
- [ ] If you called
Questions:
-
Is this the most up-to-date documentation we have?
-
The
_created
child series: a. Is it only supported forcounter
,summary
andhistogram
? b. Is this the time the metric got declared / added to the registry? Or is this the time the metric was first observed? c. Do we report one_created
value per metric or per time series? In other words, if I have:http_server_requests_total{code="200"} 10.0 1590078339.214435 http_server_requests_total{code="400"} 1.0 1590078339.214435
Should I have:
http_server_requests_created{code="200"} 1590078336.554018 http_server_requests_created{code="400"} 1590078336.554018
Or just:
http_server_requests_created 1590078336.554018
And if the former, do we have a separate one per Histogram Bucket?Note: in the documentation, all examples happen to not have labels. Moreover, trying to figure this out by experimentation, I think there may be a problem with the Python parser. For Counters, it only accepts the second example (no labels for
_created
). For Histograms, it only accepts "all the labels minusle
". Both kind of make sense independently, but they seem like they should be mutually exclusive. -
Exemplars a. Are they only available for
counter
andhistogram
? b. We are reporting an exemplar, of the many the app could have reported. Does it have to be the latest? Can it be an "unstable" pick between the many? (the reason I ask is that if different scrapes are ok to report different exemplars, even if the metric hasn't changed, then we don't need to share those exemplars between processes, and we can just keep them in the Metric object itself, instead of persisting to the files)
is a breaking change.
Adding UNIT should not be breaking. The only big breaking one is _total for counters.
updated_at timestamp for each data point
I don't see what this has to do with OpenMetrics, and last I checked client_ruby shouldn't need to worry about timestamps at all.
Ok, wow, that's an impressively quick answer :)
Adding UNIT should not be breaking. The only big breaking one is _total for counters.
Correct. I'd introduce both together though, since they both imply changing how you name your metrics
I don't see what this has to do with OpenMetrics, and last I checked client_ruby shouldn't need to worry about timestamps at all.
I interpreted this:
# TYPE foo counter
foo_total 17.0 1520879607.789
To be a counter with a value of 17, and a timestamp on when it got updated to that value. Did I misread that?
@brian-brazil I added a bunch of questions that came up yesterday, trying to implement OpenMetrics in Ruby. These may stem from me not using very up-to-date documentation, that document I linked to was all I could find. Could you let me know what's the answer to those? I'd like to work on preparing for this change late in the next couple of week if possible.
Thank you!
Correct. I'd introduce both together though, since they both imply changing how you name your metrics
Unit can be handled gracefully, see how the Python client does it.
Did I misread that?
You did, it works the exact same as the Prometheus format. Direct instrumentation should never expose timestamps.
You did, it works the exact same as the Prometheus format.
I didn't know about these timestamps on the Prometheus format... What are they for? Do you have any keywords I can google, or links to documentation about them?
Given this:
- Should we report the
_created
series in the Ruby Client, when using OpenMetrics? - Should exemplars have a timestamp? Or is this also not something the Ruby Client should expose?
What are they for?
Federation roughly speaking, I don't believe client_ruby has the features where they'd come up.
Should we report the _created series in the Ruby Client, when using OpenMetrics?
Yes, and in Prometheus format too. The same samples should end up in Prometheus no matter what format is negotiated.
Should exemplars have a timestamp? Or is this also not something the Ruby Client should expose?
If you plan on supporting exemplars, yes.
Amazing, thank you!
@dmagliola I'm gonna try and finish #220 in the next few days, which will leave us with two breaking changes on master for us to release in 3.0. Are we fine with this being in 4.0? If not I think we need to figure out a separate branch which we can cut non-breaking releases from on 2.x.
I'm absolutely fine with this going in later releases. This feature shouldn't block any other progress.
Has there been any movement on this in the last year or two? I could use this feature and I am happy to take over.
Hello @jeremiahishere! Glad to hear you're interested in this feature and would have time to contribute to it!
There hasn't been any movement on this, and there are things we are interested in working on first (mainly, exporting performance), so I don't think this will happen soon.
If you'd like to contribute, we'd love to see that! Depending on how involved this would be, and how much it'd affect the rest of the codebase, it might be worth catching up a bit on a super basic draft PR that'd show the general approach, to make sure we're all in the same page, before putting in a lot of work.
I posted a sketch pr to the discussion tab: https://github.com/prometheus/client_ruby/discussions/290 Please take a look when you have some time.