opentelemetry-collector
opentelemetry-collector copied to clipboard
Questions regarding otel collector
Hey, I want to start using the otel collector in our production environment and i have a few questions:
- I use deployment in k8s for otel collector, i want to use hpa for additional scalability, is that recommended or is it better to over provision the pods to support a large amount of spans per second? i want to support around 1m spans per second.
- How can i avoid downtime by spike in the amount of spans and find the application responsible for it, is there a metric for spans received from each application?
- My applications use
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: http://opentelemetry-collector.kube-system.svc.cluster.local:4317
to connect to the collectors, how can i balance traffic between the collectors? right now some pods are loaded more than others. - Is there a way to add rate limiting for request amounts and not by memory usage? I saw this processor: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor but will it work with multiple pods?
Thanks.
I use deployment in k8s for otel collector, i want to use hpa for additional scalability, is that recommended or is it better to over provision the pods to support a large amount of spans per second? i want to support around 1m spans per second.
Not sure about the status of HPA today, but would say a combination of HPA and a bit of over provisioning (20%) to support burst until HPA triggers?
How can i avoid downtime by spike in the amount of spans and find the application responsible for it, is there a metric for spans received from each application?
We don't, probably we can add that, please file an issue to document the request.
My applications use OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: http://opentelemetry-collector.kube-system.svc.cluster.local:4317 to connect to the collectors, how can i balance traffic between the collectors? right now some pods are loaded more than others.
This is how I would do https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types.
Is there a way to add rate limiting for request amounts and not by memory usage? I saw this processor: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor but will it work with multiple pods?
Can you elaborate on this?
@bogdandrutu For balancing traffic you mean using headless service? Regarding rate limiting i would like to be able to throw spans from specific clients that sent above of 1k spans per second for example, is it possible with any current processor?
When using gRPC (port 4317), you can use headless services plus client-side load balancing to achieve decent load balancing if you have enough clients. Note that gRPC has a long-lived HTTP connection, so adding more instances won't immediately help unless you have a good churn of clients. I think we added a setting to automatically close connections periodically to alleviate that, but doing more than that would incur in performance inefficiencies for the regular case.
When using gRPC (port 4317), you can use headless services plus client-side load balancing to achieve decent load balancing if you have enough clients. Note that gRPC has a long-lived HTTP connection, so adding more instances won't immediately help unless you have a good churn of clients. I think we added a setting to automatically close connections periodically to alleviate that, but doing more than that would incur in performance inefficiencies for the regular case.
@jpkrohling by setting you mean https://github.com/open-telemetry/opentelemetry-collector/blob/f64389d15f8b4dbddd807a16aabd84a57ce7826b/exporter/otlpexporter/testdata/config.yaml#L21-L24 ? Is there something similar for HTTP connections?
Also anything regarding the rate limiting?
No, it would be max_connection_age
as seen here:
https://github.com/open-telemetry/opentelemetry-collector/tree/main/config/configgrpc#server-configuration
Is there something similar for http connections as well? or is it only for gRPC?
No, most HTTP connections are not long-lived anyway (or not expected to be), except perhaps for h2 connections under ideal conditions.
@jpkrohling Regarding rate limiting i would like to be able to throw spans from specific clients that sent above of 1k spans per second for example, is it possible with any current processor?
No, there's no such processor as of now.
Closing as inactive. Please reopen if further work is required.