vector icon indicating copy to clipboard operation
vector copied to clipboard

(kafka sink): Add AVRO support to Kafka sink

Open erikbos opened this issue 5 years ago • 20 comments

It would be nice to have AVRO support in Kafka sink so data read as JSON gets output'ed as AVRO. Advantages would be input/output validation against schema and less overhead compares to raw JSON.

erikbos avatar Jul 02 '19 19:07 erikbos

Additionally, supporting confluent's schema registry (or other registries for that matter) would add great value. It's simply reading a magic byte then an integer and caching schema responses from the registry.

Igosuki avatar Feb 06 '20 19:02 Igosuki

+1 for avro support , I am new to rust I can try contributing back with some help is this feature an approved usecase for vector?

raghu999 avatar Apr 07 '20 17:04 raghu999

@raghu999 Hi! We'd like to have this feature yes! The first step would be speccing out a proposal. (Eg, how would we know the schema?)

You can do this informally with us here while you play with some ideas and get started looking at the code. :)

Hoverbear avatar Apr 07 '20 22:04 Hoverbear

You can use the avro-rs crate, but it needs to be patched to support fingerprinting schemas.

There are two types of schema resolution, either you have an id of a schema (after the magic byte), which you can use to query a schema registry, or a fingerprint, which you can use to internally resolve a schema from the fingerprint (i.e. the schema has been manually provided by the user).

You can then, map the schema to the internal representation of Vector and that's pretty much it.

I already have a branch started on my computer, just need to find the time

Igosuki avatar May 25 '20 08:05 Igosuki

I'm using Avro instead of protobuf for logging consistent events in avro files or parquet files because the data engineering tooling around it is easier to work with, so that would make vector usable in more than just system logging for me.

Igosuki avatar May 25 '20 08:05 Igosuki

Sounds like it might be more and more worthwhile to support https://docs.confluent.io/current/schema-registry/index.html

Hoverbear avatar May 28 '20 20:05 Hoverbear

Both fingerprints and cache registry client with ids are valid implementations. Make sure not to force ppl to deploy schema registry when they can just inline a schema in conf

Igosuki avatar Jun 04 '20 18:06 Igosuki

Avro support for source + sink would be interesting to me, @binarylogic would it be good to open a separate issue for source support?

spencergilbert avatar Oct 05 '20 19:10 spencergilbert

Avro support for source + sink would be interesting to me as well. +1

fuchsde avatar Jul 06 '21 11:07 fuchsde

Avro support for source + sink would be interesting to me as well +1

jianchen2580 avatar Jul 14 '21 00:07 jianchen2580

Avro support for source + sink would be interesting to me as well +1

simontilmant avatar Jul 16 '21 15:07 simontilmant

Avro support for source + sink would be interesting to me as well +1

free6k avatar Oct 26 '21 11:10 free6k

Avro support for source + sink would be interesting to me as well +1 along with schema registry option

Akshay2Agarwal avatar Jan 01 '22 03:01 Akshay2Agarwal

@jszwedko @spencergilbert Am I right that in current implementation we are already able to use Avro as a message format? As it's already done in pulsar sink. As far as I see, it should work: kafka sink will use already implemented avro encoder during the serialization. The only missing point here is Schema Registry support - it should be implemented additionally. However, it's still a huge improvement, even without Schema Registry support.

Maybe we just need to update the documentation? However, I didn't test Vector -> Kafka with Avro yet - I just guess based on my understanding of the existing source code.

Would be awesome, if someone from the chat will test Kafka sink with Avro :)

zamazan4ik avatar Nov 08 '22 20:11 zamazan4ik

Regarding Schema Registry support - probably it would be useful to keep in mind: https://github.com/confluentinc/libserdes/

zamazan4ik avatar Nov 08 '22 20:11 zamazan4ik

Literally the only reason I continue to keep Logstash around :)

ghakobyan-sqsp avatar May 10 '23 21:05 ghakobyan-sqsp

Any plan to support this one?

yjagdale avatar Jun 08 '23 05:06 yjagdale

It's not currently on our roadmap, but we'd be happy to support a community contribution!

spencergilbert avatar Jun 08 '23 12:06 spencergilbert

Isn't this already supported as per https://github.com/vectordotdev/vector/pull/19342 and https://vector.dev/docs/reference/configuration/sinks/kafka/#encoding.avro? Why is the issue still open?

silverwind avatar Feb 13 '24 11:02 silverwind

My conclusion is that standard Avro encoding is supported on the Kafka sink, but the proprietary Confluence Schema Registry wire format is not, see https://github.com/vectordotdev/vector/issues/19872 for that topic. Likely enough to close this issue and continue in the other one.

silverwind avatar Feb 13 '24 13:02 silverwind

My conclusion is that standard Avro encoding is supported on the Kafka sink, but the proprietary Confluence Schema Registry wire format is not, see #19872 for that topic. Likely enough to close this issue and continue in the other one.

Agreed. Closing this since it was implemented by https://github.com/vectordotdev/vector/pull/19342

jszwedko avatar May 06 '24 18:05 jszwedko