Micrometer metrics enhancement requests
A few enhancement requests for the micrometer metrics:
- Currently, reconciler metrics use
operator.sdk.reconciliations.executions.<reconciler name>as the meter name, however when building a generic dashboard for java operator metrics with static panel expressions the metric names must be well known, I think I would suggest using the reconciler name as another tag instead. (Similar comment foroperator.sdk.<map name>.size). Then you could perhaps provide a generic Grafana dashboard to evaluate operator health for a few different flavors, like prometheus. - The
reconciliations_retries_lastandreconciliations_retries_numbertags on theoperator_sdk_reconciliations_*metric would be better characterized as another two gauges as they are metrics in their own right; When working with metrics series, each change in reconciliations_retries_last and reconciliations_retries_number would appear as a new series as the tags/labels have changed. When detecting problems with a reconciler, I would prefer to trigger on the metric value for the retries and last rather than the label value for the retries and last. - Histogram generation between Micrometer and Prometheus produces Prometheus Native histogram output, it would be nice if we could control whether it uses Native or Classic histograms. The buckets for native histograms are automatically generated and make it difficult to dashboard. Being able to configure buckets like 0.1, 0.5, 1, 2, 5, 10, 20, 30, Inf seconds would be nice as generally we want to tell whether we are within a particular SLA
See the default output below:
# HELP operator_sdk_controllers_execution_reconcile_seconds
# TYPE operator_sdk_controllers_execution_reconcile_seconds histogram
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.001"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.001048576"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.001398101"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.001747626"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.002097151"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.002446676"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.002796201"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.003145726"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.003495251"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.003844776"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.004194304"} 0
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.005592405"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.006990506"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.008388607"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.009786708"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.011184809"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.01258291"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.013981011"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.015379112"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.016777216"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.022369621"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.027962026"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.033554431"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.039146836"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.044739241"} 1
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.050331646"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.055924051"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.061516456"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.067108864"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.089478485"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.111848106"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.134217727"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.156587348"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.178956969"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.20132659"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.223696211"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.246065832"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.268435456"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.357913941"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.447392426"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.536870911"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.626349396"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.715827881"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.805306366"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.894784851"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="0.984263336"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="1.073741824"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="1.431655765"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="1.789569706"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="2.147483647"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="2.505397588"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="2.863311529"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="3.22122547"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="3.579139411"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="3.937053352"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="4.294967296"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="5.726623061"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="7.158278826"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="8.589934591"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="10.021590356"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="11.453246121"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="12.884901886"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="14.316557651"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="15.748213416"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="17.179869184"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="22.906492245"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="28.633115306"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="30.0"} 2
operator_sdk_controllers_execution_reconcile_seconds_bucket{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1",le="+Inf"} 2
operator_sdk_controllers_execution_reconcile_seconds_count{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1"} 2
operator_sdk_controllers_execution_reconcile_seconds_sum{controller="tomcatreconciler",resource_group="tomcatoperator.io",resource_kind="Tomcat",resource_name="test-tomcat1",resource_namespace="default",resource_scope="namespace",resource_version="v1"} 0.051458916
Hi @tapwaterbuffalo thank you for your input.
Then you could perhaps provide a generic Grafana dashboard to evaluate operator health for a few different flavors, like prometheus.
Yes, this I'm having long time on my mind. Will create a separate issue for this. https://github.com/operator-framework/java-operator-sdk/issues/2975
Do you plan to prepare a PR for the other propositions?
Histogram generation between Micrometer and Prometheus produces Prometheus Native histogram output, it would be nice if we could control whether it uses Native or Classic histograms. The buckets for native histograms are automatically generated and make it difficult to dashboard. Being able to configure buckets like 0.1, 0.5, 1, 2, 5, 10, 20, 30, Inf seconds would be nice as generally we want to tell whether we are within a particular SLA
I agree that we should make this customizable