java-operator-sdk icon indicating copy to clipboard operation
java-operator-sdk copied to clipboard

Micrometer metrics enhancement requests

Open tapwaterbuffalo opened this issue 3 months ago • 2 comments

A few enhancement requests for the micrometer metrics:

  1. Currently, reconciler metrics use operator.sdk.reconciliations.executions.<reconciler name> as the meter name, however when building a generic dashboard for java operator metrics with static panel expressions the metric names must be well known, I think I would suggest using the reconciler name as another tag instead. (Similar comment for operator.sdk.<map name>.size). Then you could perhaps provide a generic Grafana dashboard to evaluate operator health for a few different flavors, like prometheus.
  2. The reconciliations_retries_last and reconciliations_retries_number tags on the operator_sdk_reconciliations_* metric would be better characterized as another two gauges as they are metrics in their own right; When working with metrics series, each change in reconciliations_retries_last and reconciliations_retries_number would appear as a new series as the tags/labels have changed. When detecting problems with a reconciler, I would prefer to trigger on the metric value for the retries and last rather than the label value for the retries and last.
  3. Histogram generation between Micrometer and Prometheus produces Prometheus Native histogram output, it would be nice if we could control whether it uses Native or Classic histograms. The buckets for native histograms are automatically generated and make it difficult to dashboard. Being able to configure buckets like 0.1, 0.5, 1, 2, 5, 10, 20, 30, Inf seconds would be nice as generally we want to tell whether we are within a particular SLA

See the default output below:

# HELP operator_sdk_controllers_execution_reconcile_seconds  
# TYPE operator_sdk_controllers_execution_reconcile_seconds histogram
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.001"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.001048576"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.001398101"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.001747626"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.002097151"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.002446676"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.002796201"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.003145726"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.003495251"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.003844776"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.004194304"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.005592405"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.006990506"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.008388607"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.009786708"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.011184809"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.01258291"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.013981011"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.015379112"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.016777216"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.022369621"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.027962026"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.033554431"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.039146836"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.044739241"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.050331646"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.055924051"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.061516456"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.067108864"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.089478485"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.111848106"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.134217727"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.156587348"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.178956969"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.20132659"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.223696211"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.246065832"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.268435456"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.357913941"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.447392426"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.536870911"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.626349396"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.715827881"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.805306366"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.894784851"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.984263336"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="1.073741824"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="1.431655765"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="1.789569706"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="2.147483647"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="2.505397588"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="2.863311529"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="3.22122547"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="3.579139411"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="3.937053352"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="4.294967296"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="5.726623061"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="7.158278826"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="8.589934591"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="10.021590356"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="11.453246121"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="12.884901886"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="14.316557651"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="15.748213416"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="17.179869184"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="22.906492245"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="28.633115306"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="30.0"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="+Inf"} 2
operator_sdk_controllers_execution_reconcile_seconds_count{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1"} 2
operator_sdk_controllers_execution_reconcile_seconds_sum{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1"} 0.051458916

tapwaterbuffalo avatar Sep 29 '25 16:09 tapwaterbuffalo

Hi @tapwaterbuffalo thank you for your input.

Then you could perhaps provide a generic Grafana dashboard to evaluate operator health for a few different flavors, like prometheus.

Yes, this I'm having long time on my mind. Will create a separate issue for this. https://github.com/operator-framework/java-operator-sdk/issues/2975

Do you plan to prepare a PR for the other propositions?

csviri avatar Sep 30 '25 06:09 csviri

Histogram generation between Micrometer and Prometheus produces Prometheus Native histogram output, it would be nice if we could control whether it uses Native or Classic histograms. The buckets for native histograms are automatically generated and make it difficult to dashboard. Being able to configure buckets like 0.1, 0.5, 1, 2, 5, 10, 20, 30, Inf seconds would be nice as generally we want to tell whether we are within a particular SLA

I agree that we should make this customizable

csviri avatar Sep 30 '25 07:09 csviri