azure-eventhub scaler always reports metric value equal to unprocessedEvent
Report
When using the azure-eventhub scaler, we found that the external metric keda_scaler_metrics_value always equals the value of unprocessedEventThreshold, even though the actual lag is clearly higher than that.
Expected Behavior
We expected the deployment to scale out accordingly when actual lag exceeds the defined threshold. The metric keda_scaler_metrics_value should reflect the real unprocessed event lag (e.g., 5~6000), based on the difference between Event Hub's last_enqueued_sequence_number and the latest checkpoint recorded in blob metadata.
Actual Behavior
Regardless of how large the actual lag grows, the external metric value returned by:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/skynet/s0-azure-eventhub-%24Default?labelSelector=scaledobject.keda.sh/name=usage-counter-scaler"
is always: "value": "3"
Which is the same as our defined threshold unprocessedEventThreshold: 3.
However, our custom monitoring script shows the actual lag (last - checkpoint) per partition can be over 6000.
Steps to Reproduce the Problem
-
Set up a ScaledObject with azure-eventhub trigger using unprocessedEventThreshold: 3
-
Ensure actual lag is well beyond 3 (e.g., 6000+ unprocessed events across partitions)
-
Query external metric:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/<ns>/<metricName>?labelSelector=scaledobject.keda.sh/name=<scaledObjectName>"
- Observe that returned value is exactly 3 instead of the actual lag
Logs from KEDA operator
2025-04-20T03:43:06Z INFO setup maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
2025-04-20T03:43:06Z INFO setup Starting manager
2025-04-20T03:43:06Z INFO setup KEDA Version: 2.17.0
2025-04-20T03:43:06Z INFO setup Git Commit: dafd9a883acc8ec2d2506e0d106b8605f2c156e9
2025-04-20T03:43:06Z INFO setup Go Version: go1.23.8
2025-04-20T03:43:06Z INFO setup Go OS/Arch: linux/amd64
2025-04-20T03:43:06Z INFO setup Running on Kubernetes 1.30 {"version": "v1.30.0"}
2025-04-20T03:43:06Z INFO controller-runtime.metrics Starting metrics server
2025-04-20T03:43:06Z INFO controller-runtime.metrics Serving metrics server {"bindAddress": ":8080", "secure": false}
2025-04-20T03:43:06Z INFO starting server {"name": "health probe", "addr": "[::]:8081"}
I0420 03:43:06.693574 1 leaderelection.go:254] attempting to acquire leader lease keda/operator.keda.sh...
I0420 03:43:48.555524 1 leaderelection.go:268] successfully acquired lease keda/operator.keda.sh
2025-04-20T03:43:48Z INFO Starting EventSource {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
2025-04-20T03:43:48Z INFO Starting EventSource {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2.HorizontalPodAutoscaler"}
2025-04-20T03:43:48Z INFO Starting Controller {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
2025-04-20T03:43:48Z INFO Starting EventSource {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
2025-04-20T03:43:48Z INFO Starting Controller {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
2025-04-20T03:43:48Z INFO Starting EventSource {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
2025-04-20T03:43:48Z INFO Starting Controller {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
2025-04-20T03:43:48Z INFO Starting EventSource {"controller": "cloudeventsource", "controllerGroup": "eventing.keda.sh", "controllerKind": "CloudEventSource", "source": "kind source: *v1alpha1.CloudEventSource"}
2025-04-20T03:43:48Z INFO Starting Controller {"controller": "cloudeventsource", "controllerGroup": "eventing.keda.sh", "controllerKind": "CloudEventSource"}
2025-04-20T03:43:48Z INFO Starting EventSource {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
2025-04-20T03:43:48Z INFO Starting Controller {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
2025-04-20T03:43:48Z INFO Starting EventSource {"controller": "clustercloudeventsource", "controllerGroup": "eventing.keda.sh", "controllerKind": "ClusterCloudEventSource", "source": "kind source: *v1alpha1.ClusterCloudEventSource"}
2025-04-20T03:43:48Z INFO Starting Controller {"controller": "clustercloudeventsource", "controllerGroup": "eventing.keda.sh", "controllerKind": "ClusterCloudEventSource"}
2025-04-20T03:43:48Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *v1.Secret"}
2025-04-20T03:43:48Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2025-04-20T03:43:48Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2025-04-20T03:43:48Z INFO Starting Controller {"controller": "cert-rotator"}
2025-04-20T03:43:48Z INFO cert-rotation starting cert rotator controller
2025-04-20T03:43:48Z INFO cert-rotation no cert refresh needed
2025-04-20T03:43:48Z INFO cert-rotation certs are ready in /certs
2025-04-20T03:43:48Z INFO Starting workers {"controller": "cert-rotator", "worker count": 1}
2025-04-20T03:43:48Z INFO cert-rotation no cert refresh needed
2025-04-20T03:43:48Z INFO cert-rotation Ensuring CA cert {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2025-04-20T03:43:48Z INFO Starting workers {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
2025-04-20T03:43:48Z INFO Starting workers {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
2025-04-20T03:43:48Z INFO Starting workers {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
2025-04-20T03:43:48Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"usage-counter-scaler","namespace":"skynet"}, "namespace": "skynet", "name": "usage-counter-scaler", "reconcileID": "ae4b9ff9-7c62-402d-abdf-e40d045cccb3"}
2025-04-20T03:43:48Z INFO cert-rotation Ensuring CA cert {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"}
2025-04-20T03:43:48Z INFO Starting workers {"controller": "clustercloudeventsource", "controllerGroup": "eventing.keda.sh", "controllerKind": "ClusterCloudEventSource", "worker count": 1}
2025-04-20T03:43:48Z INFO Starting workers {"controller": "cloudeventsource", "controllerGroup": "eventing.keda.sh", "controllerKind": "CloudEventSource", "worker count": 1}
2025-04-20T03:43:48Z INFO Starting workers {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
2025-04-20T03:43:48Z INFO cert-rotation no cert refresh needed
2025-04-20T03:43:48Z INFO cert-rotation Ensuring CA cert {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2025-04-20T03:43:48Z INFO cert-rotation Ensuring CA cert {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"}
2025-04-20T03:43:48Z INFO Detected resource targeted for scaling {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"usage-counter-scaler","namespace":"skynet"}, "namespace": "skynet", "name": "usage-counter-scaler", "reconcileID": "ae4b9ff9-7c62-402d-abdf-e40d045cccb3", "resource": "apps/v1.Deployment", "name": "usage-counter"}
2025-04-20T03:43:48Z INFO Initializing Scaling logic according to ScaledObject Specification {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"usage-counter-scaler","namespace":"skynet"}, "namespace": "skynet", "name": "usage-counter-scaler", "reconcileID": "ae4b9ff9-7c62-402d-abdf-e40d045cccb3"}
2025-04-20T03:43:48Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"usage-counter-scaler","namespace":"skynet"}, "namespace": "skynet", "name": "usage-counter-scaler", "reconcileID": "14e98427-69ea-44fe-bf67-8293416f1779"}
2025-04-20T03:43:48Z INFO Detected resource targeted for scaling {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"usage-counter-scaler","namespace":"skynet"}, "namespace": "skynet", "name": "usage-counter-scaler", "reconcileID": "14e98427-69ea-44fe-bf67-8293416f1779", "resource": "apps/v1.Deployment", "name": "usage-counter"}
2025-04-20T03:43:49Z INFO cert-rotation CA certs are injected to webhooks
2025-04-20T03:43:49Z INFO grpc_server Starting Metrics Service gRPC Server {"address": ":9666"}
2025-04-20T04:36:28Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"usage-counter-scaler","namespace":"skynet"}, "namespace": "skynet", "name": "usage-counter-scaler", "reconcileID": "b1b5bea2-70a1-4506-af78-d971df08c706"}
2025-04-20T04:36:28Z INFO Detected resource targeted for scaling {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"usage-counter-scaler","namespace":"skynet"}, "namespace": "skynet", "name": "usage-counter-scaler", "reconcileID": "b1b5bea2-70a1-4506-af78-d971df08c706", "resource": "apps/v1.Deployment", "name": "usage-counter"}
2025-04-20T04:36:28Z INFO Updated HPA according to ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"usage-counter-scaler","namespace":"skynet"}, "namespace": "skynet", "name": "usage-counter-scaler", "reconcileID": "b1b5bea2-70a1-4506-af78-d971df08c706", "HPA.Namespace": "skynet", "HPA.Name": "keda-hpa-usage-counter-scaler"}
2025-04-20T04:36:28Z INFO Initializing Scaling logic according to ScaledObject Specification {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"usage-counter-scaler","namespace":"skynet"}, "namespace": "skynet", "name": "usage-counter-scaler", "reconcileID": "b1b5bea2-70a1-4506-af78-d971df08c706"}
2025-04-20T04:36:28Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"usage-counter-scaler","namespace":"skynet"}, "namespace": "skynet", "name": "usage-counter-scaler", "reconcileID": "c034cf1b-1639-4c31-abdb-5111bd0a1280"}
2025-04-20T04:36:28Z INFO Detected resource targeted for scaling {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"usage-counter-scaler","namespace":"skynet"}, "namespace": "skynet", "name": "usage-counter-scaler", "reconcileID": "c034cf1b-1639-4c31-abdb-5111bd0a1280", "resource": "apps/v1.Deployment", "name": "usage-counter"}
KEDA Version
2.17.0
Kubernetes Version
1.32
Platform
None
Scaler Details
azure-eventhub
Anything else?
metadata: eventHubNamespace: wigptEventDev eventHubName: llmusageevent consumerGroup: $default storageAccountName: wigptdatadev checkpointStrategy: blobMetadata blobContainer: agentstreaming-checkpoint unprocessedEventThreshold: "3" authenticationRef: name: usage-counter-trigger-auth
Following up with visual confirmation that the actual event lag is well above the threshold, yet KEDA's external metric remains stuck at the unprocessedEventThreshold value (3).
- Event Hub: Latest Enqueued Sequence Number = 45729 Partition ID: 0 (Source: Azure Event Hub Data Explorer UI)
- KEDA Checkpoint (Blob): Sequence Number = 43108 Blob container: agentstreaming-checkpoint Blob path: $Default/0 → This gives a real lag of 2621
- HPA External Metric Remains at 3 / 3 Despite actual lag = 2621, the metric received by HPA still shows current = 3, which exactly matches the unprocessedEventThreshold.
@JorTurFer Would you please kindly check this aged issue? Thanks a lot
Could you set --zap-log-level=debug in the keda-operator to enable debug logging and see the retrieved metric value?
I have collected the following logs:
2025-06-05T13:09:09Z DEBUG grpc_server Providing metrics {"scaledObjectName": "usage-counter-scaler", "scaledObjectNamespace": "skynet", "metrics": "&ExternalMetricValueList{ListMeta:{ <nil>},Items:[]ExternalMetricValue{ExternalMetricValue{MetricName:s0-azure-eventhub-test,MetricLabels:map[string]string{},Timestamp:2025-06-05 13:09:09.278550712 +0000 UTC m=+3118.542187599,WindowSeconds:nil,Value:{{64000 -3} {<nil>} DecimalSI},},},}"}
2025-06-05T13:09:13Z DEBUG azure_eventhub_scaler Partition ID: 0, Last SequenceNumber: 59692, Checkpoint SequenceNumber: 969, Total new events in partition: 58723 {"type": "ScaledObject", "namespace": "skynet", "name": "usage-counter-scaler"}
2025-06-05T13:09:13Z DEBUG azure_eventhub_scaler Unprocessed events in event hub total: 58723, scaling for a lag of 64 related to 1 partitions {"type": "ScaledObject", "namespace": "skynet", "name": "usage-counter-scaler"}
2025-06-05T13:09:13Z DEBUG scale_handler Getting metrics and activity from scaler {"scaledObject.Namespace": "skynet", "scaledObject.Name": "usage-counter-scaler", "scaler": "azureEventHubScaler", "metricName": "s0-azure-eventhub-test", "metrics": [{"metricName":"s0-azure-eventhub-test","metricLabels":null,"timestamp":"2025-06-05T13:09:13Z","value":"64"}], "activity": true, "scalerError": null}
2025-06-05T13:09:13Z DEBUG scale_handler Scaler for scaledObject is active {"scaledObject.Namespace": "skynet", "scaledObject.Name": "usage-counter-scaler", "scaler": "azureEventHubScaler", "metricName": "s0-azure-eventhub-test"}
2025-06-05T13:09:18Z DEBUG azure_eventhub_scaler Partition ID: 0, Last SequenceNumber: 59692, Checkpoint SequenceNumber: 969, Total new events in partition: 58723 {"type": "ScaledObject", "namespace": "skynet", "name": "usage-counter-scaler"}
2025-06-05T13:09:18Z DEBUG azure_eventhub_scaler Unprocessed events in event hub total: 58723, scaling for a lag of 64 related to 1 partitions {"type": "ScaledObject", "namespace": "skynet", "name": "usage-counter-scaler"}
2025-06-05T13:09:18Z DEBUG scale_handler Getting metrics and activity from scaler {"scaledObject.Namespace": "skynet", "scaledObject.Name": "usage-counter-scaler", "scaler": "azureEventHubScaler", "metricName": "s0-azure-eventhub-test", "metrics": [{"metricName":"s0-azure-eventhub-test","metricLabels":null,"timestamp":"2025-06-05T13:09:18Z","value":"64"}], "activity": true, "scalerError": null}
2025-06-05T13:09:18Z DEBUG scale_handler Scaler for scaledObject is active {"scaledObject.Namespace": "skynet", "scaledObject.Name": "usage-counter-scaler", "scaler": "azureEventHubScaler", "metricName": "s0-azure-eventhub-test"}
2025-06-05T13:09:23Z DEBUG azure_eventhub_scaler Partition ID: 0, Last SequenceNumber: 59692, Checkpoint SequenceNumber: 969, Total new events in partition: 58723 {"type": "ScaledObject", "namespace": "skynet", "name": "usage-counter-scaler"}
2025-06-05T13:09:23Z DEBUG azure_eventhub_scaler Unprocessed events in event hub total: 58723, scaling for a lag of 64 related to 1 partitions {"type": "ScaledObject", "namespace": "skynet", "name": "usage-counter-scaler"}
2025-06-05T13:09:23Z DEBUG scale_handler Getting metrics and activity from scaler {"scaledObject.Namespace": "skynet", "scaledObject.Name": "usage-counter-scaler", "scaler": "azureEventHubScaler", "metricName": "s0-azure-eventhub-test", "metrics": [{"metricName":"s0-azure-eventhub-test","metricLabels":null,"timestamp":"2025-06-05T13:09:23Z","value":"64"}], "activity": true, "scalerError": null}
2025-06-05T13:09:23Z DEBUG scale_handler Scaler for scaledObject is active {"scaledObject.Namespace": "skynet", "scaledObject.Name": "usage-counter-scaler", "scaler": "azureEventHubScaler", "metricName": "s0-azure-eventhub-test"}
2025-06-05T13:09:24Z DEBUG azure_eventhub_scaler Partition ID: 0, Last SequenceNumber: 59692, Checkpoint SequenceNumber: 969, Total new events in partition: 58723 {"type": "ScaledObject", "namespace": "skynet", "name": "usage-counter-scaler"}
2025-06-05T13:09:24Z DEBUG azure_eventhub_scaler Unprocessed events in event hub total: 58723, scaling for a lag of 64 related to 1 partitions {"type": "ScaledObject", "namespace": "skynet", "name": "usage-counter-scaler"}
2025-06-05T13:09:24Z DEBUG scale_handler Getting metrics from trigger {"scaledObject.Namespace": "skynet", "scaledObject.Name": "usage-counter-scaler", "trigger": "azureEventHubScaler", "metricName": "s0-azure-eventhub-test", "metrics": [{"metricName":"s0-azure-eventhub-test","metricLabels":null,"timestamp":"2025-06-05T13:09:24Z","value":"64"}], "scalerError": null}
2025-06-05T13:09:24Z DEBUG fallback Fallback is not enabled, hence skipping the health update to the scaledobject {"scaledObject.Namespace": "skynet", "scaledObject.Name": "usage-counter-scaler"}
2025-06-05T13:09:24Z DEBUG grpc_server Providing metrics {"scaledObjectName": "usage-counter-scaler", "scaledObjectNamespace": "skynet", "metrics": "&ExternalMetricValueList{ListMeta:{ <nil>},Items:[]ExternalMetricValue{ExternalMetricValue{MetricName:s0-azure-eventhub-test,MetricLabels:map[string]string{},Timestamp:2025-06-05 13:09:24.506450452 +0000 UTC m=+3133.771067687,WindowSeconds:nil,Value:{{64000 -3} {<nil>} DecimalSI},},},}"}
2025-06-05T13:09:28Z DEBUG azure_eventhub_scaler Partition ID: 0, Last SequenceNumber: 59692, Checkpoint SequenceNumber: 969, Total new events in partition: 58723 {"type": "ScaledObject", "namespace": "skynet", "name": "usage-counter-scaler"}
2025-06-05T13:09:28Z DEBUG azure_eventhub_scaler Unprocessed events in event hub total: 58723, scaling for a lag of 64 related to 1 partitions {"type": "ScaledObject", "namespace": "skynet", "name": "usage-counter-scaler"}
2025-06-05T13:09:28Z DEBUG scale_handler Getting metrics and activity from scaler {"scaledObject.Namespace": "skynet", "scaledObject.Name": "usage-counter-scaler", "scaler": "azureEventHubScaler", "metricName": "s0-azure-eventhub-test", "metrics": [{"metricName":"s0-azure-eventhub-test","metricLabels":null,"timestamp":"2025-06-05T13:09:28Z","value":"64"}], "activity": true, "scalerError": null}
2025-06-05T13:09:28Z DEBUG scale_handler Scaler for scaledObject is active {"scaledObject.Namespace": "skynet", "scaledObject.Name": "usage-counter-scaler", "scaler": "azureEventHubScaler", "metricName": "s0-azure-eventhub-test"}
2025-06-05T13:09:33Z DEBUG azure_eventhub_scaler Partition ID: 0, Last SequenceNumber: 59692, Checkpoint SequenceNumber: 969, Total new events in partition: 58723 {"type": "ScaledObject", "namespace": "skynet", "name": "usage-counter-scaler"}
It appears from the metrics that the Total new events in partition is 58,723, which is much greater than the unprocessedEventThreshold, but the deployment still does not scale out.
@SpiritZhou Would you please kindly check on above update and advice further?
From the code, it looks like the scaling is based on the total partition count instead of scaling based on the total events count. I think we should add a property like limitToPartitionsWithLag in Kafka scaler to distinguish between these two scaling methods.
https://github.com/kedacore/keda/blob/a18596513f7328374c17cdd0e481b7b7c59ca779/pkg/scalers/azure_eventhub_scaler.go#L340C1-L346C2
@JorTurFer Do you know why the Azure Event Hub is designed like this and does not consider scaling based on the total event count?
Let me check if I have understood the issue (I'm not expert in EventHub), KEDA is checking all the partitions, doesn't matter if they are used or not. For any reason, the publisher can publish to only some of the partitions and not to all of them, right? I thought that event hub spreads the messages across partitions automatically, but if it can place messages only in some partitions and not in all, I think that this extra parameter can make sense
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
@JorTurFer Do you know why the Azure Event Hub is designed like this and does not consider scaling based on the total event count?
IDK, but you're right and we can enable something like in kafka, take into account only active partitions
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity.