keda icon indicating copy to clipboard operation
keda copied to clipboard

Provide an OpenTelemetry scaler

Open tomkerkhove opened this issue 3 years ago • 31 comments

Proposal

OpenTelemetry allows applications/vendors to push metrics to a collector or integrate it's own exporters in the app.

KEDA should provide an OpenTelemetry scaler which is used as an exporter so we can pull metrics and scale accordingly.

Scaler Source

OpenTelemetry Metrics

Scaling Mechanics

Scale based on returned metrics.

Authentication Source

TBD

Anything else?

OpenTelemetry Metrics are still in beta but going GA by end of the year.

Go SDK: https://github.com/open-telemetry/opentelemetry-go

tomkerkhove avatar Nov 26 '21 08:11 tomkerkhove

It's a really good improvement, I will have a look to see if I can help with this topic 🙂

mknet3 avatar Nov 26 '21 14:11 mknet3

Awesome, thank you!

tomkerkhove avatar Nov 26 '21 14:11 tomkerkhove

@mknet3 , ping me if you need help ;)

JorTurFer avatar Nov 27 '21 00:11 JorTurFer

just to confirm, I'm on it and I will help with this scaler

mknet3 avatar Dec 12 '21 19:12 mknet3

Great, thanks!

tomkerkhove avatar Dec 13 '21 06:12 tomkerkhove

Hi @tomkerkhove, I have had a look at this issue and I would like to clarify some things. AFAIK the goal of this issue is to provide an scaler based on metrics exported by an exporter configured in the collector. This exporter will expose metrics in a KEDA format to be read by the scaler. Quick question, does the exporter already exist or is there a plan to develop it? (I suppose will be in opentelemetry-collector-contrib). This question is to figure out what will be the format of the exposed data to pull it in the scaler.

mknet3 avatar Dec 19 '21 18:12 mknet3

That would be part of the investigation but I think we'll need to build our own exporter to get the metrics in; or use the gRPC OTEL exporter / HTTP OTEL exporter as a starting point to push it to KEDA.

I'd prefer the latter approach to get started as we don't have a preference on the metric format, so OTEL is fine.

tomkerkhove avatar Dec 20 '21 08:12 tomkerkhove

@mknet3 prefer to keep it free for the moment because it's his first task with golang

JorTurFer avatar Feb 01 '22 16:02 JorTurFer

Working on this

sushmithavangala avatar May 13 '22 09:05 sushmithavangala

Before we go all in, might be good to post a proposal here @SushmithaVReddy to avoid having to redo things but think relying on OTEL exporter is best

tomkerkhove avatar May 13 '22 11:05 tomkerkhove

@tomkerkhove , sure. I'll put a proposal here before we start the implementation.

Quick doubt: Is the idea here to scale based on the metrics obtained from the data type -go.opentelemetry.io/otel/exporters/otlp/otlpmetrics ?

sushmithavangala avatar May 19 '22 09:05 sushmithavangala

@tomkerkhove Will KEDA be acting as a collector that gets metrics data from an exporter? Is the idea to create metrics using https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#instrument and observe through hpa to scale accordingly? Slightly confused on the term exporter and collector w.r.t KEDA. Plausible solution looks like one where user has an exporter and exports metrics, keda connects to this exporter and gets metrics (collector?) for scaling decision based on the same metrics being mentioned in scaled object.

sushmithavangala avatar May 20 '22 10:05 sushmithavangala

The idea is to use the OTEL exporter (https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/otlpexporter/README.md) from which KEDA fetches metrics to make scaling decision on.

This is similar to how we integrate with Prometheus where we pull the metrics from Prometheus and move on, however, here it's in OTEL format coming from the expected OTEL exporter that end-users have to add to their OTEL collector (so not up to KEDA)

From an end-user perspective, they should give us:

  1. Uri of OTEL endpoint to talk to on the collector (but they add the following to their collector: https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/otlpexporter/README.md#getting-started)
  2. Optional parameter to use gRPC or HTTP (but we can just start with gRPC for now as well)

Hope that helps?

tomkerkhove avatar May 23 '22 08:05 tomkerkhove

The idea is to use the OTEL exporter (https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/otlpexporter/README.md) from which KEDA fetches metrics to make scaling decision on.

This is similar to how we integrate with Prometheus where we pull the metrics from Prometheus and move on, however, here it's in OTEL format coming from the expected OTEL exporter that end-users have to add to their OTEL collector (so not up to KEDA)

From an end-user perspective, they should give us:

  1. Uri of OTEL endpoint to talk to on the collector (but they add the following to their collector: https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/otlpexporter/README.md#getting-started)
  2. Optional parameter to use gRPC or HTTP (but we can just start with gRPC for now as well)

Hope that helps?

This helps Tom. Thanks!

sushmithavangala avatar May 23 '22 10:05 sushmithavangala

Before we go all in, might be good to post a proposal here @SushmithaVReddy to avoid having to redo things but think relying on OTEL exporter is best

@tomkerkhove any thoughts on the scaled object here(ref below). The idea is to use OTEL (https://pkg.go.dev/go.opentelemetry.io/otel) and connect to the endpoint mentioned in the scaledobject and pull the metric value and compare to the threshold to scale.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: opentelemetry-scaledobject
  namespace: keda
  labels:
    deploymentName: dummy
spec:
  maxReplicaCount: 12
  scaleTargetRef:
    name: dummy
  triggers:
    - type: opentelemetry
      metadata:
        exporter: http://otel-collector:4317
	  metrics:
         - metricName: http_requests_total
           threshold: '100'
      authenticationRef:
        name: authdata

I was also wondering about scenario's where users want to pull multiple metrics from their application and scale based on conditions on the metrics. Eg as below

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: opentelemetry-scaledobject
  namespace: keda
  labels:
    deploymentName: dummy
spec:
  maxReplicaCount: 12
  scaleTargetRef:
    name: dummy
  triggers:
    - type: opentelemetry
      metadata:
        exporter: http://otel-collector:4317
	  metrics:
         -  metricName: http_requests_total
             threshold: '100'
	     operator: greaterthan
	  - metricName: http_timeouts
	     threshold:  '5'
             operator: lesserthan
        query: http_requests_total and http_timeouts
      authenticationRef:
        name: authdata

Any ideas on what is the scope of the scalar we'll be building in terms of multiple metrics?

sushmithavangala avatar Jun 06 '22 09:06 sushmithavangala

It's ok for me to use that package since that's the official SDK - Thanks for checking.

I don't see the difference between both proposals other than one vs multiple metrics though? Can you elaborate on it?

In terms of supporting multiple metrics - I'd argue that given we support multiple triggers it might be more aligned with other scalers to only support 1 metric per trigger to keep a consistent approach in KEDA. The only consideration I would have here is performance but I think we can manage that in the implementation. Thoughts @zroubalik @JorTurFer?

Based on that we'll need to review the YAML spec but in general I think it's ok; however if we use multiple levels then I would use exporter.url instead of exporter given we might need auth in the future or similar settings.

tomkerkhove avatar Jun 07 '22 06:06 tomkerkhove

It's ok for me to use that package since that's the official SDK - Thanks for checking.

I don't see the difference between both proposals other than one vs multiple metrics though? Can you elaborate on it?

In terms of supporting multiple metrics - I'd argue that given we support multiple triggers it might be more aligned with other scalers to only support 1 metric per trigger to keep a consistent approach in KEDA. The only consideration I would have here is performance but I think we can manage that in the implementation. Thoughts @zroubalik @JorTurFer?

Based on that we'll need to review the YAML spec but in general I think it's ok; however if we use multiple levels then I would use exporter.url instead of exporter given we might need auth in the future or similar settings.

Yes @tomkerkhove the proposals point out multiple metrics usage as you understood. I agree with the consistency over other scalars we have, but I'm concerned about how much value will our scaling add considering it can scale on a single metric where open-telemetry's is majorly used to spit a lot of metrics.

nitpick: If we have one metric/scaled object and user wants to scale based on multiple metrics and goes ahead and creates that many scaled objects, I wonder how we handle concurrent scenarios where multiple metrics will result in scaling (over scaling? because the scaled up instances could've been reused? )

sushmithavangala avatar Jun 07 '22 07:06 sushmithavangala

It would be nice to also make the protocol configurable given OTEL supports both http and gRPC

tomkerkhove avatar Jun 07 '22 07:06 tomkerkhove

It's ok for me to use that package since that's the official SDK - Thanks for checking.

I don't see the difference between both proposals other than one vs multiple metrics though? Can you elaborate on it?

In terms of supporting multiple metrics - I'd argue that given we support multiple triggers it might be more aligned with other scalers to only support 1 metric per trigger to keep a consistent approach in KEDA. The only consideration I would have here is performance but I think we can manage that in the implementation. Thoughts @zroubalik @JorTurFer?

Based on that we'll need to review the YAML spec but in general I think it's ok; however if we use multiple levels then I would use exporter.url instead of exporter given we might need auth in the future or similar settings.

Yes @tomkerkhove the proposals point out multiple metrics usage as you understood. I agree with the consistency over other scalars we have, but I'm concerned about how much value will our scaling add considering it can scale on a single metric where open-telemetry's is majorly used to spit a lot of metrics.

nitpick: If we have one metric/scaled object and user wants to scale based on multiple metrics and goes ahead and creates that many scaled objects, I wonder how we handle concurrent scenarios where multiple metrics will result in scaling (over scaling? because the scaled up instances could've been reused? )

Ha, but we have this covered already today.

Customer should only create 1 SO per scale target (which we will provide validation soon). However, 1 SO can have 1 or more triggers and start scaling as soon as one of them meets the criteria. You can learn more about that in our concepts.

tomkerkhove avatar Jun 07 '22 07:06 tomkerkhove

Right, the below one makes sense?

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: opentelemetry-scaledobject
  namespace: keda
  labels:
    deploymentName: dummy
spec:
  maxReplicaCount: 12
  scaleTargetRef:
    name: dummy
  triggers:
    - type: opentelemetry
      metadata:
        exporter:
           protocol: grpc
           url: http://otel-collector:4317
	metric:
           name: http_requests_total
           threshold: '100'
        authenticationRef:
          name: authdata
    - type: opentelemetry
        metadata:
          exporter:
             protocol: grpc
             url: http://otel-collector:4317
          metric:
             name: http_errors
             threshold: '10'
        authenticationRef:
          name: authdata

sushmithavangala avatar Jun 07 '22 08:06 sushmithavangala

Yeah, this is correct, you can define multiple triggers per SO. Just one thing, the metric.name is related to otel?

zroubalik avatar Jun 07 '22 08:06 zroubalik

Yeah, this is correct. Just one thing, the metric.name is related to otel?

metric.name will be used to pull the metric. It should match with the name user have in their instrumented application.

sushmithavangala avatar Jun 07 '22 08:06 sushmithavangala

Okay, and let's make the trigger metadata flat, to be in sync with other scalers:

Something like this, feel free to rename/update the fields to follow OTEL convetions.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: opentelemetry-scaledobject
  namespace: keda
  labels:
    deploymentName: dummy
spec:
  maxReplicaCount: 12
  scaleTargetRef:
    name: dummy
  triggers:
    - type: opentelemetry
      metadata:
           protocol: grpc
           exporter: http://otel-collector:4317
           metric: http_requests_total
           threshold: '100'
        authenticationRef:
          name: authdata

zroubalik avatar Jun 07 '22 08:06 zroubalik

Sounds good to me.

tomkerkhove avatar Jun 07 '22 09:06 tomkerkhove

Maybe I misunderstand something about this conversation, but given all pods in a replicaset will contain an otel collector, which one would the keda autoscaler talk to in order to make the decisions?

Also, how would you apply aggregates across metric labels?

markallanson avatar Jun 24 '22 14:06 markallanson

KEDA will not manage the OTEL collector and is something you'd need to run separately next to KEDA/in your cluster.

Does that clarify it?

tomkerkhove avatar Jun 27 '22 05:06 tomkerkhove

Sorry maybe my question was not clear enough.

If you have 10 pods, all of which have otel sidecars running, which will keda talk to? If just one, it won't have enough information to base scaling decisions. If it talks to all of them then how will it generate aggregates of the data across all?

markallanson avatar Jun 27 '22 13:06 markallanson

There is no sidecar involved, there will be a separate deployment that KEDA integrates with through a Kubernetes service. End-users will have to bring their own OpenTelemetry Collector: https://opentelemetry.io/docs/collector/deployment/

tomkerkhove avatar Jun 28 '22 07:06 tomkerkhove

Any update on this @SushmithaVReddy ?

tomkerkhove avatar Aug 07 '22 08:08 tomkerkhove

The priorities of @SushmithaVReddy have changed and no longer has time to complete the task so I'm unassigning her.

tomkerkhove avatar Aug 17 '22 20:08 tomkerkhove