jitsu icon indicating copy to clipboard operation
jitsu copied to clipboard

Apache Kafka as source

Open vklimontovich opened this issue 5 years ago • 3 comments

Problem

Kafka is a very popular message broker system. We want to listen to messages from Kafka's topic.

Solution

sources:
  kafka1:
    type: kafka
    server: "address:port"
    topic: mytopic
    format: json

Here's how Kafka should be configured. EventNative should listen to a particular topic and expect messages to be in JSON format (format=json should be the only format we support so far). Events should be processed with the default server-to-server pipeline further

vklimontovich avatar Nov 30 '20 11:11 vklimontovich

What about Kafka as a destination?

Just to clarify the use case.

We use Kafka topics as ingestion source and rely on Kafka engine tables to ingest them in CH. We also use those kafka topics as an 'event lake' to archive the raw events as history (with an expiration date), just in case in the future we detect a bug or problem downstream and want to reply.

For realtime we are relying on Divolte, but we appreciate the simplicity of EventNative, and want to remove the requirements to use AVRO in the pipeline. For batch we simply insert events in the Kafka topic.

gervarela avatar Jan 14 '21 16:01 gervarela

Hi @gervarela !

It makes sense. Just to clarify the use-case, the pipleline you're talking about will look like: Tracker (JSON) → EventNative → Kafka? EventNative will send JSON to Kafka as-is? Also, would it be a batch mode or stream mode?

Please, feel free to create an issue about Kafka as destination. Probably, we can deliver it in 2-4 weeks timeline

vklimontovich avatar Jan 15 '21 09:01 vklimontovich

Hi @vklimontovich,

Sounds a like great addition.

Yes, the pipeline will be like you said: Tracker (JSON) --> EventNative --> Kafka topic (JSON) --> Any system.

It ads complexity to the system, but also you can leverage all the great support to ingest Kafka data in almost any system. We use this arquitecture to feed ClickHouse, but also stream processing systems like Apache Spark or Faust.

gervarela avatar Jan 18 '21 07:01 gervarela