kepler
kepler copied to clipboard
OpenTelemetry deployment or API integration
For deployment integration, evaluate the architecture of metrics -> telemetry adapter For API integration, evaluate telemetry client scalability in kepler
for api migration, maybe we need to double check if open telemetry supports all kinds of kepler metrics today. as I found Summaries type of metrics is marked as legacy in openTelemetry without migration guide.
meeting 30: implement otel api client in kepler and emit telemetry directly. Hopefully there is a way to convert metrics to telemetry. @husky-parul will take the first try.
previous discussion is here https://github.com/sustainable-computing-io/kepler/issues/97
Just verified and we can export OpenTelemetric metrics and then by using OpenTelemetry Collector we can also expose metrics to Prometheus.
Therefore If the user has OpenTelemetry Collector deployed in the Cluster, Kepler does not need to export Prometheus metrics.
So we need to make it configurable and avoid duplications. That is, if OpenTelemetry metrics are enabled, we should disable Prometheus metrics and vice-versa.
Recap
Towards our migration to OpenTelemetry Mterics from Prometheus metrics to allow vendor- and tool-agnostic observability I did an initial POC of instrumenting an exporter using OTEL SDK and collecting metrics using otel collector and dashboard using grafana (poc example)
Before starting with the migration I was looking into kepler code to identify metrics type. So far I see kepler uses only Counters and Gauge
https://github.com/sustainable-computing-io/kepler/blob/main/pkg/collector/prometheus_process_collector.go#L30
Otel SDK supports Synchronous Counter and Asynchronous GaugeObserver. They have highlights a point about GaugeObserver:
For GaugeObserver timeseries, backends usually display the last value and don't allow to sum different timeseries together.
It should not affect our implementation though. @rootfs @marceloamaral @sunya-ch @SamYuan1990 @bertysentry
This is awesome @husky-parul! WRT metric types, make sure to use Gauge only for metrics that are usually not summable (additive), like temperature, ratios, etc. For other metrics that move "up and down", like measured electrical power, you should use UpDownCounter. See OpenTelemetry Supplementary Guidelines about this.
I am proposing the following. @sustainable-computing-io/maintainer please TAL. Let me know if you have any questions.

Components
Instrumentation: Kepler instrumented using the OTEL SDK to collect metrics.
OTEL Collector: The OTEL collector receives the exported metrics data from the instrumented applications. The collector acts as an intermediary component that processes and routes the telemetry data to the appropriate destinations. For Kepler we are going to support OpenTelemetry protocol (OTLP), to receive data from the instrumented applications.
Exporters: The OTEL collector will utilize OTEL exporters to send metrics data to backends. We are currently using Prometheus as backend but other options include InfluxDB, ElasticSearch. We will be using OTEL Prometheus Exporter with Grafana These exporters convert the collected metrics into a format that Grafana can understand and consume.
Data Storage: The exported metrics data is stored in Prometheus.
Grafana Data Source: Grafana will configure to connect to the Prometheus data storage backend where the metrics data is stored. Connection is established through the Prometheus data source within Grafana.
Visualization in Grafana: Grafana can query the metrics data from the storage backend and create visualizations based on the collected metrics.
When using the Prometheus exporter, I recommend enabling the normalization of metric names with this flag: --feature-gates=pkg.translator.prometheus.NormalizeName. Otel metric names will be normalized as described here
@husky-parul this looks great! Look forward to this happening!
Looks good to me! Thanks for working on this!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I don't think this is stale. This issue should get more attention as OTel is quickly becoming the de-facto standard to export telemetry everywhere.
It is not stale. I am working on this and a demo/PR is WIP.
Also good news: future version of Prometheus will be capable of ingesting Otel metrics, and the mechanism to translate Otel metrics to Prometheus metrics is the one I mentioned earlier.
thank you @bertysentry for the info! We are going to make this happen in the next release milestone. Stay tuned!
@husky-parul that https://github.com/sustainable-computing-io/kepler/issues/659#issuecomment-1638153081 is great. Do you already have some updates?
@rootfs Just out of curiosity, is there a timeline for the next milestone?
Thanks for sharing the information.
Just out of curiosity, is there a timeline for the next milestone?
@frzifus Otel integration will be part of our next release which will be part of 0.7 in this case. Our releases takes place every 6 months, so it will be happening in Q1 of 2024
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
are we done for this ticket? @rootfs
Any documentation for using otel to collect metrics from kepler? Thanks
https://github.com/husky-parul/otel-observability
Please try this. We haven’t merged this doc into Kepler website yet. Thanks