Apache Kafka as source
Problem
Kafka is a very popular message broker system. We want to listen to messages from Kafka's topic.
Solution
sources:
kafka1:
type: kafka
server: "address:port"
topic: mytopic
format: json
Here's how Kafka should be configured. EventNative should listen to a particular topic and expect messages to be in JSON format (format=json should be the only format we support so far). Events should be processed with the default server-to-server pipeline further
What about Kafka as a destination?
Just to clarify the use case.
We use Kafka topics as ingestion source and rely on Kafka engine tables to ingest them in CH. We also use those kafka topics as an 'event lake' to archive the raw events as history (with an expiration date), just in case in the future we detect a bug or problem downstream and want to reply.
For realtime we are relying on Divolte, but we appreciate the simplicity of EventNative, and want to remove the requirements to use AVRO in the pipeline. For batch we simply insert events in the Kafka topic.
Hi @gervarela !
It makes sense. Just to clarify the use-case, the pipleline you're talking about will look like: Tracker (JSON) → EventNative → Kafka? EventNative will send JSON to Kafka as-is? Also, would it be a batch mode or stream mode?
Please, feel free to create an issue about Kafka as destination. Probably, we can deliver it in 2-4 weeks timeline
Hi @vklimontovich,
Sounds a like great addition.
Yes, the pipeline will be like you said: Tracker (JSON) --> EventNative --> Kafka topic (JSON) --> Any system.
It ads complexity to the system, but also you can leverage all the great support to ingest Kafka data in almost any system. We use this arquitecture to feed ClickHouse, but also stream processing systems like Apache Spark or Faust.