opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

Questions regarding otel collector

Open LeonD9 opened this issue 2 years ago • 9 comments

Hey, I want to start using the otel collector in our production environment and i have a few questions:

  • I use deployment in k8s for otel collector, i want to use hpa for additional scalability, is that recommended or is it better to over provision the pods to support a large amount of spans per second? i want to support around 1m spans per second.
  • How can i avoid downtime by spike in the amount of spans and find the application responsible for it, is there a metric for spans received from each application?
  • My applications use OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: http://opentelemetry-collector.kube-system.svc.cluster.local:4317 to connect to the collectors, how can i balance traffic between the collectors? right now some pods are loaded more than others.
  • Is there a way to add rate limiting for request amounts and not by memory usage? I saw this processor: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor but will it work with multiple pods?

Thanks.

LeonD9 avatar Aug 28 '22 13:08 LeonD9

I use deployment in k8s for otel collector, i want to use hpa for additional scalability, is that recommended or is it better to over provision the pods to support a large amount of spans per second? i want to support around 1m spans per second.

Not sure about the status of HPA today, but would say a combination of HPA and a bit of over provisioning (20%) to support burst until HPA triggers?

How can i avoid downtime by spike in the amount of spans and find the application responsible for it, is there a metric for spans received from each application?

We don't, probably we can add that, please file an issue to document the request.

My applications use OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: http://opentelemetry-collector.kube-system.svc.cluster.local:4317 to connect to the collectors, how can i balance traffic between the collectors? right now some pods are loaded more than others.

This is how I would do https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types.

Is there a way to add rate limiting for request amounts and not by memory usage? I saw this processor: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor but will it work with multiple pods?

Can you elaborate on this?

bogdandrutu avatar Sep 02 '22 16:09 bogdandrutu

@bogdandrutu For balancing traffic you mean using headless service? Regarding rate limiting i would like to be able to throw spans from specific clients that sent above of 1k spans per second for example, is it possible with any current processor?

LeonD9 avatar Sep 11 '22 10:09 LeonD9

When using gRPC (port 4317), you can use headless services plus client-side load balancing to achieve decent load balancing if you have enough clients. Note that gRPC has a long-lived HTTP connection, so adding more instances won't immediately help unless you have a good churn of clients. I think we added a setting to automatically close connections periodically to alleviate that, but doing more than that would incur in performance inefficiencies for the regular case.

jpkrohling avatar Sep 12 '22 20:09 jpkrohling

When using gRPC (port 4317), you can use headless services plus client-side load balancing to achieve decent load balancing if you have enough clients. Note that gRPC has a long-lived HTTP connection, so adding more instances won't immediately help unless you have a good churn of clients. I think we added a setting to automatically close connections periodically to alleviate that, but doing more than that would incur in performance inefficiencies for the regular case.

@jpkrohling by setting you mean https://github.com/open-telemetry/opentelemetry-collector/blob/f64389d15f8b4dbddd807a16aabd84a57ce7826b/exporter/otlpexporter/testdata/config.yaml#L21-L24 ? Is there something similar for HTTP connections?

Also anything regarding the rate limiting?

LeonD9 avatar Sep 13 '22 08:09 LeonD9

No, it would be max_connection_age as seen here:

https://github.com/open-telemetry/opentelemetry-collector/tree/main/config/configgrpc#server-configuration

jpkrohling avatar Sep 13 '22 13:09 jpkrohling

Is there something similar for http connections as well? or is it only for gRPC?

LeonD9 avatar Sep 13 '22 13:09 LeonD9

No, most HTTP connections are not long-lived anyway (or not expected to be), except perhaps for h2 connections under ideal conditions.

jpkrohling avatar Sep 13 '22 13:09 jpkrohling

@jpkrohling Regarding rate limiting i would like to be able to throw spans from specific clients that sent above of 1k spans per second for example, is it possible with any current processor?

LeonD9 avatar Sep 15 '22 09:09 LeonD9

No, there's no such processor as of now.

jpkrohling avatar Sep 15 '22 11:09 jpkrohling

Closing as inactive. Please reopen if further work is required.

atoulme avatar Dec 14 '23 07:12 atoulme