[Proposal] detailed metric mode, for value in descriptor KEY_VALUE pair, if value not explicitly defined/matched in config
In order to gain greater observability of the operations of the rate limit service, a "detailed" metric mode that exposes metrics on the descriptor KEY_VALUE pairs in all cases, not only when the value is explicitly configured in the config. These values would allow for substantially more granular alerts / downstream reporting based on metrics emitted by the process when a descriptor key is matched, but its value is not.
While there's an understandable the concern over metric cardinality when including the values in the metrics, it is often the case that the possible values are well understood, and limited such that including them would pose little concern.
Seems like a practical compromise might be to have a fixed-sized buffer for each config that can hold a sample of real-world key/value pairs that were matched during particular time buckets, or what OpenCensus would call "exemplars" https://github.com/census-instrumentation/opencensus-specs/blob/master/stats/Exemplars.md (also demonstrated in https://youtu.be/U72b4Nl0Ftw?t=1300)
The exemplars approach sounds interesting. I would also be fine with an "all-metrics" mode as long as it is opt-in.
Might be an unrelated question, but I wonder how do people track things like latency/QPS/error rate on API (ShouldRateLimit) level?
I also have an interest in this and created an issue quite a while ago that turned "stale". https://github.com/envoyproxy/ratelimit/issues/311. I have now created an implementation of this, that would at least match our needs.
I added a key to the configuration called "include_value_in_metric_when_not_specified"
which will override the default behavior and add the values to metrics.
I can create a PR attached to this issue and you can have a look PR-submitted https://github.com/envoyproxy/ratelimit/pull/389