eventing
eventing copied to clipboard
How to handle multiple request schemes in metrics
Problem Sometimes, an event which is dispatched from the imc dispatcher might dispatch with mixed schemes. If there are one or more subscribers with https, then those events will be dispatched with https scheme. If there are one or more subscribers with http, then those events will be dispatched with http scheme. This can cause issues when it comes to recording metrics, because if we record each metric once for http and once for https then our event count will be wrong for the dispatcher (since some events will be recorded twice). Some ideas proposed by @maschmid to fix this are:
- Rather than have
event_scheme="http"orevent_scheme="https", tag the metrics withevent_scheme_http=true/falseandevent_scheme_https=true/false. - Do
event_scheme="https"only if ALL subscribers are https, otherwise report http.
Persona: Which persona is this feature for? Administrators trying to observe the system
Exit Criteria The metric counts are always correct, even in TLS permissive
Time Estimate (optional): How many developer-days do you think this may take to resolve? 1
Additional context (optional) Add any other context about the feature request here.
My personal vote here is to go with option 1 from above, as that way:
- the event count will be correct
- I can create other aggregate views of information to see if e.g. https dispatches fail more frequently than http dispatches
cc @pierDipi @creydr @Leo6Leo any thoughts on what approach we should take here? One of the options above, some other option?
To me the problem is not how we tag metrics but maybe how it is implemented, I don't think the proposed tags in 1 are idiomatic, also queries similar to 2
I can create other aggregate views of information to see if e.g. https dispatches fail more frequently than http dispatches
seem hard to express in promql as we couldn't easily aggregate using sum by (event_scheme)
Question: where is the metrics problem, is it on ingress or is it on egress ?
On ingress, when an event comes in event_scheme is based on the channel's ingress scheme, if it's on egress we need to count and emit metrics for each subscriber separately rather than as aggregate
@pierDipi this metrics problem is on the egress for IMC. So, I guess the fix is to change how we count and emit metrics in the imc-dispatcher?
/assign