keda
keda copied to clipboard
Provide operational insights on # of triggers per trigger type
Proposal
Based on the current docs, we provide the following metrics today:
-
keda_metrics_adapter_scaler_error_totals
- The total number of errors encountered for all scalers. -
keda_metrics_adapter_scaled_object_error_totals
- The number of errors that have occurred for each scaled object. -
keda_metrics_adapter_scaler_errors
- The number of errors that have occurred for each scaler. -
keda_metrics_adapter_scaler_metrics_value
- The current value for each scaler’s metric that would be used by the HPA in computing the target average.
Based on these, I don't see a good fit to extend them with good labels. Another reason for it is that ScaledObjects/ScaledJobs with multiple triggers of the same type would not be reflected correctly.
That's why I propose to introduce keda_metrics_adapter_trigger_totals
metric with a type
label that is the type of trigger being used, for example cron
.
Use-Case
Gain insights on how many triggers are using a given trigger type.
This helps you better understand your autoscaling landscape and its dependencies.
Anything else?
No response
Would like to hear some thoughts on this from @JorTurFer @zroubalik
I think that we can add these metrics about adoption, it's true that from monitoring pov could not give a lot of information, but from adoption pov it gives. I mean, as a cluster operator I could want to know the aggregated values of trigger types to apply some improvements, .e.g: If I only have 1 azure scaler, maybe doesn't make sense to use managed identities. This also gives useful information about what that cluster is used for.
In general, I don't have any problem with improving the metrics we offer because they are private inside your cluster and we don't collect them so each admin is able to collect them or not.
I can work on implementing this.
Reposting a query from Slack -
The value for this metric might change every time a ScaledObject
/ ScaledJob
is reconciled, as the triggers for them might be changed by the user. Considering the reconciliation happens in the operator, I am not sure how best to expose the Prometheus metrics through the metrics adapter. Need some guidance on this.
@zroubalik @tomkerkhove @JorTurFer
I'd expose them on the operator instead of the metrics adapter as it's not related to the metric adapter at all. So I'd introduce a new endpoint there
Well, if we're okay with exposing a new endpoint on the operator, then it shouldn't be a big task (I think?). Should the metric be named keda_operator_trigger_totals
then?
Don't expose another endpoint in the operator please, we are already exposing a single endpoint with the runtime-metrics (operator-sdk does it). Recently we have talked about unifying them in the metrics server, so let's do it directly in this case
Basically, instead of starting another server, you should register the metrics in the already existing server by prometheus global registry https://book.kubebuilder.io/reference/metrics.html#publishing-additional-metrics
Basically, instead of starting another server, you should register the metrics in the already existing server by prometheus global registry https://book.kubebuilder.io/reference/metrics.html#publishing-additional-metrics
The link mentions this -
You may then record metrics to those collectors from any part of your reconcile loop. These metrics can be evaluated from anywhere in the operator code.
But the reconcile loop is running within a separate pod, so this won't work in our case? Or am I misunderstanding something from the docs?
You asked about adding this metric to the operator, there you have reconciliation loops xD In the metrics-server already have 2 Prometheus metric servers (which we should unify)
Oh, so you're suggesting that we should reuse the Prometheus endpoint used for the runtime metrics in the operator
pod, instead of exposing another one.
I just got confused and thought that you wanted to not expose the metrics in the operator and expose them in the adapter itself.
Don't expose another endpoint in the operator please, we are already exposing a single endpoint with the runtime-metrics (operator-sdk does it). Recently we have talked about unifying them in the metrics server, so let's do it directly in this case
Agreed, sorry I was more referring to exposing them on the operator rather than introducing a new endpoint.
Lost in translation xD I meant , let's reuse the current Prometheus endpoint in the operator and avoid duplications like in the metrics server
Each one of us misunderstood the others 😄