common expfmt: Implement OpenMetrics generation

To support OpenMetrics in v1 of prometheus/client_golang, we need to add OpenMetrics generation to the expfmt package, effectively a proto-to-OpenMetrics converter.

For this to work, we need to add the needed additional fields to metrics.proto. This should be backwards compatible for still existing protobuf consumers, while it provides an "inofficial" protobuf representation for OpenMetrics as a byproduct. This has the potential of creating confusion. However, the prometheus/client_model repo is anyway mostly deprecated, so we might as well abuse it for this "hack".

Update: Basic support is done. Here are the remaining TODOs (some of them will require more or less breaking changes, so we might consider bundling them and perhaps even starting to use major version numbers for this repo):

Feature support:
- [ ] # UNIT.
- [ ] _created.
- [ ] info type.
- [ ] stateset type.
- [ ] gaugehistogram type.
[ ] Figure out how to deal with the _total suffix. (Currently, we require it to be in the metric name and truncate it for TYPE/HELP, which is arguably the opposite of what would be expected but which simplifies the transition from current Prom text format usage.)
[ ] Reject exemplars that are too long.
[ ] Reject counters with NaN
[ ] More sophisticated timestamp rendering for both samples and exemplars (to not run into precision and overflow issues with float64 and time.UnixNano).
[ ] Get rid of NegotiateIncludingOpenMetrics (but we have to make sure there remains some way of letting users opt out of OpenMetrics to support legacy setups – which could be on the Prometheus server side by making the Accept header configurable).
[ ] Add the Close method directly to the Encoder interface.

Nov 28 '19 12:11 beorn7

This has to take into account special float formatting. (An integer number that is actually typed as a float always has to end on .0 or has to contain an e. Note that histogram buckets are types as int.)

Also, OpenMetrics suffers from name collisions (due to additional "magic" suffixes of the metric name: _created, _total). The generator has to detect those collisions to avoid creating invalid output.

Nov 28 '19 13:11 beorn7

An integer number that is actually typed as a float always has to end on .0 or has to contain an e. Note that histogram buckets are types as int.

As currently drafted this only strictly applies to le and quantile labels - and even that is not finalised as to whether it'll be a SHOULD or MUST. I'd hope the Go client would be a good example that others can follow though, considering it'll likely be the most used.

Nov 28 '19 13:11 brian-brazil

Hmm, I would have thought it's also quite important for sample values. Following line of thought:

The unambiguous marking of int vs float is meant for consumers that, unlike Prometheus, distinguish between int and float as a sample value.
Those consumers will register a newly appearing time series as one or the other, depending on the first sample. Let's say it's 1.234, i.e. float.
If then suddenly a sample comes in that reads 1 (instead of 1.0), they will be confused.
The other way round is probably worse: Get an int first, filing it as an int series, and then suddenly floats come in.

In contrast, I would see le and quantile labels as unproblematic because they are always floats. And since we have no proven way of unambiguous float formatting (see https://github.com/OpenObservability/OpenMetrics/issues/129 ), those labels have to be sanitized be the consumer anyway before series identity is determined. (In different news, a more advanced consumer, including possibly a future version of Prometheus, will not treat the quantiles and buckets as separate series anyway, so those labels aren't really labels anymore, and the whole problem disappears.)

Nov 28 '19 14:11 beorn7

If then suddenly a sample comes in that reads 1 (instead of 1.0), they will be confused.

But must handle it, as while it'd be not recommended each output is still individually valid - and can happen for good reasons such as due to a new binary release.

The other way round is probably worse:

Yeah, that's more of a problem. Same considerations apply though.

I'd treat it as a compression hint more than anything. We should be nice to consumers where practical though - especially given the prominence of client_golang.

so those labels aren't really labels anymore, and the whole problem disappears

That's the dream, once we manage to get most other monitoring system vendors able to understand our style of histograms.

Nov 28 '19 14:11 brian-brazil

Basic OpenMetrics generation is implemented now.

I'll update the issue description to include the remaning TODOs. (We could break them into separate issues and close this one if that's preferred.)

Jan 20 '20 11:01 beorn7

This issue is fine, the OP is nicely organised.

We also need to drop _sum when there's negative buckets.

Jan 20 '20 11:01 brian-brazil

We also need to drop _sum when there's negative buckets.

It would be a pity if OM really went down that road. See https://github.com/OpenObservability/OpenMetrics/issues/143 for a better solution.

Jan 20 '20 12:01 beorn7