vector
vector copied to clipboard
(kafka sink): Add AVRO support to Kafka sink
It would be nice to have AVRO support in Kafka sink so data read as JSON gets output'ed as AVRO. Advantages would be input/output validation against schema and less overhead compares to raw JSON.
Additionally, supporting confluent's schema registry (or other registries for that matter) would add great value. It's simply reading a magic byte then an integer and caching schema responses from the registry.
+1 for avro support , I am new to rust I can try contributing back with some help is this feature an approved usecase for vector?
@raghu999 Hi! We'd like to have this feature yes! The first step would be speccing out a proposal. (Eg, how would we know the schema?)
You can do this informally with us here while you play with some ideas and get started looking at the code. :)
You can use the avro-rs crate, but it needs to be patched to support fingerprinting schemas.
There are two types of schema resolution, either you have an id of a schema (after the magic byte), which you can use to query a schema registry, or a fingerprint, which you can use to internally resolve a schema from the fingerprint (i.e. the schema has been manually provided by the user).
You can then, map the schema to the internal representation of Vector and that's pretty much it.
I already have a branch started on my computer, just need to find the time
I'm using Avro instead of protobuf for logging consistent events in avro files or parquet files because the data engineering tooling around it is easier to work with, so that would make vector usable in more than just system logging for me.
Sounds like it might be more and more worthwhile to support https://docs.confluent.io/current/schema-registry/index.html
Both fingerprints and cache registry client with ids are valid implementations. Make sure not to force ppl to deploy schema registry when they can just inline a schema in conf
Avro support for source + sink would be interesting to me, @binarylogic would it be good to open a separate issue for source support?
Avro support for source + sink would be interesting to me as well. +1
Avro support for source + sink would be interesting to me as well +1
Avro support for source + sink would be interesting to me as well +1
Avro support for source + sink would be interesting to me as well +1
Avro support for source + sink would be interesting to me as well +1 along with schema registry option
@jszwedko @spencergilbert Am I right that in current implementation we are already able to use Avro as a message format? As it's already done in pulsar
sink. As far as I see, it should work: kafka
sink will use already implemented avro encoder during the serialization. The only missing point here is Schema Registry support - it should be implemented additionally. However, it's still a huge improvement, even without Schema Registry support.
Maybe we just need to update the documentation? However, I didn't test Vector -> Kafka with Avro yet - I just guess based on my understanding of the existing source code.
Would be awesome, if someone from the chat will test Kafka sink with Avro :)
Regarding Schema Registry support - probably it would be useful to keep in mind: https://github.com/confluentinc/libserdes/
Literally the only reason I continue to keep Logstash around :)
Any plan to support this one?
It's not currently on our roadmap, but we'd be happy to support a community contribution!
Isn't this already supported as per https://github.com/vectordotdev/vector/pull/19342 and https://vector.dev/docs/reference/configuration/sinks/kafka/#encoding.avro? Why is the issue still open?
My conclusion is that standard Avro encoding is supported on the Kafka sink, but the proprietary Confluence Schema Registry wire format is not, see https://github.com/vectordotdev/vector/issues/19872 for that topic. Likely enough to close this issue and continue in the other one.
My conclusion is that standard Avro encoding is supported on the Kafka sink, but the proprietary Confluence Schema Registry wire format is not, see #19872 for that topic. Likely enough to close this issue and continue in the other one.
Agreed. Closing this since it was implemented by https://github.com/vectordotdev/vector/pull/19342