jaeger icon indicating copy to clipboard operation
jaeger copied to clipboard

[Bug]: The kafka partition lag doesn't grow in a distributed manner while collecting traces from collector and before sending it to ingester

Open shivtej-opsverse opened this issue 7 months ago • 1 comments

What happened?

I have Jaeger collector sending traces to ingester via a Kafka queue with 3 partitions, each consumed by a replica of Jaeger ingester. Whenever I generate load on the system I see that the Kafka consumer lag grows which is expected. The problem I am facing is that a partition's lag grows more exponentially than others.

Attaching the image of the lag trends of each portion below

Steps to reproduce

  • Send traces from jaeger collector to ingester via kafka.
  • Make sure you have multiple partitions for the topic which is consumed by ingester in Kafka.
  • Monitor the lag using Kafka exporter.

Expected behavior

I expect the lag to grow uniformly across partitions instead of it being unevenly distributed.

Relevant log output

No response

Screenshot

Screenshot 2024-06-28 at 6 21 27 PM

Additional context

No response

Jaeger backend version

v1.57.0

SDK

No response

Pipeline

OTEL Collector --> Jaeger Collector --> Kafka --> Jaeger Ingester --> Clickhouse

Stogage backend

Clickhouse

Operating system

Linux

Deployment model

Kubernetes

Deployment configs

No response

shivtej-opsverse avatar Jun 28 '24 13:06 shivtej-opsverse