Survey existing metrics definitions across existing libraries
From the meeting notes where this action item was created:
- Lower level Telemetry.Metrics interface in Erlang
- Currently using Structs and Protocols, so hard to convert to Erlang
- Docs might not be as good
- Intention with Phoenix 1.5 is to include this by default, so might not be as seamless
- The API needs to be really good because end-user developers are going to interact with it; not just library authors
- Main issue is with reporters, because if the internal data structures are different, they’d need to support both - need some kind of abstraction that both can handle (like maps)
- How will this interact with OpenTelemetry’s metrics feature set? Probably a lot of overlap, so we need to make sure that it’s not too confusing for people.
- Action: Arkadiusz to Survey existing metrics definitions across existing libraries (Prometheus, OpenCensus, Statix, Telemetry.Metrics) before next meeting
Currently using Structs and Protocols, so hard to convert to Erlang
I haven't found any usage of protocols in telemetry_metrics. Heave I missed something? About structs, as Elixir provides quite easy support for records (without support for protocols though). I think it shouldn't be much of the problem.
Intention with Phoenix 1.5 is to include this by default, so might not be as seamless
Erlang implementation still can provide Elixir-like API. BTW the same should be done for telemetry itself to provide more seamless migration for consumers.
How will this interact with OpenTelemetry’s metrics feature set?
I would suggest that we would ignore direct API in OT and instead "force" user to always use telemetry for sending data to OT which should be only consumer. In that way we would sacrifice some part of the OT specs for better user experience.
About existing metrics types, most common I am aware of are:
-
counter/sum- these two are equivalent -
histogram+ sometimes more specialized versions of it liketiming -
gauge/value- single value at the measurement time
Some other tools also provide metrics like meter which work like taking derivative of gauge, but I think it is out of scope for telemetry_metrics.
BTW the same should be done for telemetry itself to provide more seamless migration for consumers.
Do you mean creating an Elixir module delegating to the Erlang one?
@arkgil yes. It could even be written in Erlang, but in general it should be made easy for consumers to "migrate" to newer versions.
@hauleth I'm not sure what you mean, or maybe I don't see the problem we're trying to solve here 😄
Regarding use of records, I would vote against it, because IMO they are problematic when they show up in stacktraces. I would say that if we aim to have a common structure for both Erlang and Elixir, then maps are the way to go (they might be structs on the Elixir side, although that too might confuse folks when debugging from Erlang).
As Łukasz wrote in a comment above, metric types supported by the libraries around fall into following buckets:
- metric counting the number of measurements. AFAIK this kind of counter is supported only by OpenCensus and Telemetry.Metrics, i.e. other libraries allow to increment/decrement the counter by arbitrary value
- metric for summing up recorded measurements
- metric keeping track of the last recorded measurement
- metric building a histogram of recorded values
- metric exposing a set of basic statistics about recorded values, like minimum, maximum, mean, chosen percentiles etc. The set of statistics vary depending on the library/system
- other, more sophisticated time-series analyses, like moving weighted averages or derivatives
When it comes to defining metrics, most of the libraries use the approach with the "registry". You call a function, the metric is registered somewhere globally, and the registry is queried whenever the metric is updated or needs to be exported. I haven't found library other than Telemetry.Metrics which uses plain data structures for defining metrics and passing them around.
I haven't found library other than Telemetry.Metrics which uses plain data structures for defining metrics and passing them around.
How many of those are attempting to interact with multiple implementations without the use of an agent though? I see one of the benefits of using data structures to define metrics is the flexibility they provide for simple migrations via reporters. OpenCensus is the only one I'm aware of that attempts abstracting the destination but moves that abstraction to the agent.
exometer, folsom and metrics (which uses first two as backends) are all quite popular (assessing by number of downloads on Hex) and allow to export metrics to multiple external systems.
The idea is that reporters subscribe to metric updates and are notified every x seconds that they should export the metric.
To me, the difference between using a registry and data structures boils down to these two things:
- With data structures, we need to tell the reporter which metrics it shall export. With the registry we can register metrics earlier and either tell it which ones it should use or which ones it should ignore.
- With data structures it's not possible for libraries to register metrics, only emit events using Telemetry, which gives more control to the user. With registry, libraries could register metrics directly so that the user can export them.