avro-rs
avro-rs copied to clipboard
Map JSON Objects to Record instead of Map
I noticed that the implementation for ToAvro
on the serde_json Value
type maps values of type Object
to the Avro value Map
instead of the complex type Record
. Is there a specific reason for that? As far as I can tell, the Record
type for Avro schemas is much more common in the definition than maps. Could we change the implementation to map Record
instead? I'd be happy to contribute that if there's nothing that blocks us from doing so.
It's been a few years since the original implementation, but I think that the rationale was that Object
in JSON is very generic and not schematized, so we thought that Map
(with the key being a string, like in JSON) was a closer fit than Record
. I think that making it a Record
might be pretty tricky, with fields that could be missing from one instance to the other.
Perhaps @flavray remembers this detail more clearly, but I am not quite sure.
For this problem, can we maybe make use of the Avro schemas? If we let the user provide one when going from JSON value to Avro value, we could get from the schema if it's defined as a record?
I think you could, but I agree with @poros that it's not really the right thing to do. Using an Avro map means that you've actively decided that you need to support a possibly different set of keys in each instance. If you have a specific structure, you'd use a record instead.
It's the same on the serde_json side. Using serde_json::Value
indicates that you want to support a possibly different set of keys in each instance. If you have a specific structure, you'd create a custom Rust struct
for it. JSON doesn't have a map/record distinction like Avro does, so when serializing to JSON serde has to create a JSON object for both use cases. But since Avro does have that distinction, we can use the Avro type that lines up with the intended use of the Rust type.
Hmm, yeah, I understand what you are saying. What would be the option to go from user input (which is completely dynamic) to an Avro Value? Like, user specifies a schema they want to follow and then input a document of some kind and I want to convert it to Avro and send it out?
Currently, I am taking the user input as JSON, which is how I came up with this problem, but maybe there is a better solution to handle this with this library?
Ahh yes, now I see what you're saying! For that, Avro defines a JSON encoding to go along with the binary encoding. And some of the other language bindings have a helper tool that reads in a bunch of JSON values and converts them to a binary Avro file according to some schema. The exact same JSON input can give you completely different Avro output files depending on which schema you choose.
I don't think that's implemented in this library just yet, but I think you'd do it by adding something similar to avro_rs::Reader
that "reads" from a serde_json::Value
instead of from a std::io::Reader
. So it would be kind of like your original suggestion, but you'd be able to translate the JSON Value
into either an Avro map or record, depending on your use case, by choosing what kind of Avro-y Rust type you read
into.
Interesting approach and thoughts, thank you for your input. I'm gonna take a look into that after work :) maybe I can come up with something that's worth a PR. I'll keep this issue updated.