client_ruby icon indicating copy to clipboard operation
client_ruby copied to clipboard

Supporting OpenMetrics

Open dmagliola opened this issue 4 years ago • 12 comments

We should support OpenMetrics. I've started this work in a temporary branch, will start opening PRs. But this seems to be the master list of things to do.

This is a breaking change that we should release as 3.0. We should have previous minor versions with deprecation warnings about the things we'll be changing.

Todo

  • [ ] Basic Formatting hygiene

    • [ ] Format negotiation with OpenMetrics Accept Header
    • [ ] Add tests to the OpenMetrics formatter I wrote
    • [ ] #EOF
    • [ ] #UNIT including validation
    • [ ] forcing _total on counters. This is a breaking change, released in 3 stages
      • [ ] Warn on counters that are not suffixed with _total
      • [ ] (next major version) Forcefully add _total to counters. Those that come with _total suffix, remove it and warn, in preparation for next major.
      • [ ] (next major version) Error on counters suffixed _total
  • [ ] Timestamps Output

    • [x] Merge PR #172 which will help with outputting "updated_at" timestamps for time series
    • [ ] series_created series
  • [ ] Exemplars

  • [ ] New Metric types

    • [ ] StateSet
    • [ ] Info
  • [ ] Document breaking changes (UPGRADING.md)

    • [ ] If you called Metric#values or Stores#all_values, you'll get objects with timestamps instead of simply float
    • [ ] Naming convention things: You must pass the unit and not include it in the name. You must not add _total to your counters. We'll add both automatically

Questions:

  1. Is this the most up-to-date documentation we have?

  2. The _created child series: a. Is it only supported for counter, summary and histogram? b. Is this the time the metric got declared / added to the registry? Or is this the time the metric was first observed? c. Do we report one _created value per metric or per time series? In other words, if I have:

    http_server_requests_total{code="200"} 10.0 1590078339.214435
    http_server_requests_total{code="400"} 1.0 1590078339.214435
    

    Should I have:

    http_server_requests_created{code="200"} 1590078336.554018
    http_server_requests_created{code="400"} 1590078336.554018
    

    Or just: http_server_requests_created 1590078336.554018 And if the former, do we have a separate one per Histogram Bucket?

    Note: in the documentation, all examples happen to not have labels. Moreover, trying to figure this out by experimentation, I think there may be a problem with the Python parser. For Counters, it only accepts the second example (no labels for _created). For Histograms, it only accepts "all the labels minus le". Both kind of make sense independently, but they seem like they should be mutually exclusive.

  3. Exemplars a. Are they only available for counter and histogram? b. We are reporting an exemplar, of the many the app could have reported. Does it have to be the latest? Can it be an "unstable" pick between the many? (the reason I ask is that if different scrapes are ok to report different exemplars, even if the metric hasn't changed, then we don't need to share those exemplars between processes, and we can just keep them in the Metric object itself, instead of persisting to the files)

dmagliola avatar May 22 '20 15:05 dmagliola

is a breaking change.

Adding UNIT should not be breaking. The only big breaking one is _total for counters.

updated_at timestamp for each data point

I don't see what this has to do with OpenMetrics, and last I checked client_ruby shouldn't need to worry about timestamps at all.

brian-brazil avatar May 22 '20 15:05 brian-brazil

Ok, wow, that's an impressively quick answer :)

Adding UNIT should not be breaking. The only big breaking one is _total for counters.

Correct. I'd introduce both together though, since they both imply changing how you name your metrics

I don't see what this has to do with OpenMetrics, and last I checked client_ruby shouldn't need to worry about timestamps at all.

I interpreted this:

# TYPE foo counter
foo_total 17.0 1520879607.789

To be a counter with a value of 17, and a timestamp on when it got updated to that value. Did I misread that?

dmagliola avatar May 22 '20 15:05 dmagliola

@brian-brazil I added a bunch of questions that came up yesterday, trying to implement OpenMetrics in Ruby. These may stem from me not using very up-to-date documentation, that document I linked to was all I could find. Could you let me know what's the answer to those? I'd like to work on preparing for this change late in the next couple of week if possible.

Thank you!

dmagliola avatar May 22 '20 15:05 dmagliola

Correct. I'd introduce both together though, since they both imply changing how you name your metrics

Unit can be handled gracefully, see how the Python client does it.

Did I misread that?

You did, it works the exact same as the Prometheus format. Direct instrumentation should never expose timestamps.

brian-brazil avatar May 22 '20 16:05 brian-brazil

You did, it works the exact same as the Prometheus format.

I didn't know about these timestamps on the Prometheus format... What are they for? Do you have any keywords I can google, or links to documentation about them?

Given this:

  • Should we report the _created series in the Ruby Client, when using OpenMetrics?
  • Should exemplars have a timestamp? Or is this also not something the Ruby Client should expose?

dmagliola avatar May 22 '20 22:05 dmagliola

What are they for?

Federation roughly speaking, I don't believe client_ruby has the features where they'd come up.

Should we report the _created series in the Ruby Client, when using OpenMetrics?

Yes, and in Prometheus format too. The same samples should end up in Prometheus no matter what format is negotiated.

Should exemplars have a timestamp? Or is this also not something the Ruby Client should expose?

If you plan on supporting exemplars, yes.

brian-brazil avatar May 22 '20 22:05 brian-brazil

Amazing, thank you!

dmagliola avatar May 23 '20 15:05 dmagliola

@dmagliola I'm gonna try and finish #220 in the next few days, which will leave us with two breaking changes on master for us to release in 3.0. Are we fine with this being in 4.0? If not I think we need to figure out a separate branch which we can cut non-breaking releases from on 2.x.

Sinjo avatar Jun 08 '21 21:06 Sinjo

I'm absolutely fine with this going in later releases. This feature shouldn't block any other progress.

dmagliola avatar Jun 08 '21 21:06 dmagliola

Has there been any movement on this in the last year or two? I could use this feature and I am happy to take over.

jeremiahishere avatar May 15 '23 21:05 jeremiahishere

Hello @jeremiahishere! Glad to hear you're interested in this feature and would have time to contribute to it!

There hasn't been any movement on this, and there are things we are interested in working on first (mainly, exporting performance), so I don't think this will happen soon.

If you'd like to contribute, we'd love to see that! Depending on how involved this would be, and how much it'd affect the rest of the codebase, it might be worth catching up a bit on a super basic draft PR that'd show the general approach, to make sure we're all in the same page, before putting in a lot of work.

dmagliola avatar May 16 '23 12:05 dmagliola

I posted a sketch pr to the discussion tab: https://github.com/prometheus/client_ruby/discussions/290 Please take a look when you have some time.

jeremiahishere avatar Jul 31 '23 11:07 jeremiahishere