zipkin-reporter-java icon indicating copy to clipboard operation
zipkin-reporter-java copied to clipboard

Feature request: `KafkaReporter` in addition to `KafkaSender`

Open jeqo opened this issue 4 years ago • 2 comments

Feature:

Add a KafkaReporter to process spans individually and have the ability to use Kafka producer native batching and been able to partition spans based on traceId.

Rationale

Current Sender API is aimed to transfer encoded batch of spans to different back-ends; already too late to do partitioning by key, and in the case of Kafka, overlaps with producer's native batching mechanisms (batch.size and linger.ms in https://kafka.apache.org/documentation/#producerconfigs).

Instead of reusing BoundedAsyncReporter with KafkaSender, a potential KafkaReporter will handle the report spans itself, allowing better message definition (e.g. using traceId as key) and using it's own batching mechanisms.

@Override public void report(Span span) {
    metrics.incrementSpans(1);
    int spanSizeInBytes = encoder.sizeInBytes(span);
    metrics.incrementSpanBytes(spanSizeInBytes);
    ProducerRecord<byte[], byte[]> record =
      new ProducerRecord<>(topic, span.traceId().getBytes(), encoder.encode(span));
    get().send(record, (metadata, exception) -> {
       if (exception != null) {
         metrics.incrementMessagesDropped(exception);
         metrics.incrementSpansDropped(1);
       }
    });
  }

Instead of looking for changes on the KafkaSender to support this improvements, Reporter API seems like the right abstraction level to place Kafka Producer.

Example Scenario:

While developing a Kafka-based storage and a streaming version of Zipkin Dependencies there is a need to have spans partitioned by traceId in order to group them and apply aggregations that can be used to apply sampling, dependency counts, etc.

In the Kafka Storage, we had to add a partitioning Span Consumer to compensate this which involves an additional overhead, compared with already keyed-spans; in which case we could focus only on processing.

jeqo avatar Aug 20 '19 00:08 jeqo

first thought is to say "in addition to" as we can host both and would need to for the foreseeable future. This is also an alternative to add partitioning to normal async reporter I think.

Previous versions of kafka had some nasty blocking behavior which could crash callers, so we need to be careful of any danger added to the call-site of the reporter including encoding overhead and potential exceptions.

Other things to watch is if we don't want to keep this compatible. Ex using a list instead of single span opens options including sending the whole local root as a unit, and also remaining compatible with normal zipkin receivers.

codefromthecrypt avatar Aug 22 '19 01:08 codefromthecrypt

Other things to watch is if we don't want to keep this compatible. Ex using a list instead of single span opens options including sending the whole local root as a unit, and also remaining compatible with normal zipkin receivers.

Agree, I have seen this could be an issue on the kafka-storage as well. Messages will be TraceId:SerializedListOfSpans

Previous versions of kafka had some nasty blocking behavior which could crash callers, so we need to be careful of any danger added to the call-site of the reporter including encoding overhead and potential exceptions.

I remember something similar was mentioned here https://github.com/openzipkin/zipkin/issues/2297 but haven't managed to find an issue with this.

This is also an alternative to add partitioning to normal async reporter I think.

Forgot to mention about this once I found the Reporter API fits here. If we do this on the AsyncReporter the main issue is that Sender API expects a serialized batch of spans, forcing KafkaSender to deserialize batch first to then build TraceId:SerializedListOfSpans messages.

If we could redesign the current APIs, one potential change on the sender API could be to receive a Map[TraceId,SerializedListOfSpans] and let the sender to build the batch depending on the back-end. This will add additional task to the AsyncReporter to build this Map instead of a batch, and to the Senders on building a request based on their backend e.g. Http Sender would build 1 payload from all map entries (before it just receive a batch and send a batch), Kafka Sender would build a producer record per map entry.

jeqo avatar Aug 22 '19 06:08 jeqo

closing due to lack of interest and stalled impl. someone can try again later if either change!

codefromthecrypt avatar Apr 16 '24 18:04 codefromthecrypt