schema_registry_converter
schema_registry_converter copied to clipboard
[Proposal] Enhancements to improve DevEX and align crate with Confluent Schema Registry java library
Feature Description
I'd like to propose a few changes aimed at improving the developer experience and ensuring the crate is in line with the Java library standards.
These changes are intended to make development and testing more straightforward and enhance the overall usability of the crate.
Here are the proposed modifications:
Introduce SchemaRegistryClient
trait:
Using a trait we can significantly facilitate testing in applications that uses this crate. It will be easier to mock and test components that depend on it.
Refactor SrSettingsBuilder
:
Rename to SchemaRegistryClientConfig
and make methods return Self
. This will improve the configuration setup in applications that use this crate.
Refactor SrSettings
:
- Rename it to
CachedSchemaRegistryClient
, and implementSchemaRegistryClient
for it. - Move caches from
AvroEncoder
andAvroDecoder
to the new Client implementation. This will make us able to share the cached schemas between the encoding and decoding process.
These changes will make this crate consistent with the Confluent Schema Registry library, and improve the code structure.
Use dynamic dispatch in AvroEncoder
and AvroDecoder
.
This will enable users to provide any implementation of SchemaRegistryClient
to encoders and decoders
I understand that these changes are not trivial. @gklijs, if you are Okay with it, I can work on it myself and open a PR.
It's indeed not trivial. I'm also wondering what would be expected to be in the SchemaRegistryClient
trait? Furthermore I wonder if big changes are being made, if we should not have a separate schema_registry_client
module/crate for the HTTP calls to schema registry, implementing the whole API (generated), and separate modules/crates for the different types. I'm also wondering when we are going to make changes, if we should still support blocking, or only keep async.
I wonder if big changes are being made, if we should not have a separate schema_registry_client module/crate for the HTTP calls to schema registry, implementing the whole API (generated), and separate modules/crates for the different types.
I feel that spliting the code in multiple crates is the right way to go. Imho, the schema-registry-client
should just be responsible for making HTTP calls to schema registry and caching retreived schemas.
Additional functionalities that uses schema-registry-client
, should be provided by a separate crate in the workspace.
I'm also wondering when we are going to make changes, if we should still support blocking, or only keep async.
Tough question... I dont know. Supporting both async and sync code in Rust is not easy. Myself personaly have never used the blocking
version of this crate, but I can't speak for everyone.
I'm also wondering what would be expected to be in the SchemaRegistryClient trait?
I think we should start by providing a method for each call that we support with the current enum SrCall
, and build from there.
Something like:
#[async_trait]
pub trait SchemaRegistryClient {
async fn get_by_id(&self, id: u32) -> Result<TBD, SchemaRegistryClientError>;
async fn get_latest_by_subject(&self, subject: &str) -> Result<TBD, SchemaRegistryClientError>;
async fn get_by_subject_and_version(&self, subject: &str, version: u32) -> Result<TBD, SchemaRegistryClientError>;
async fn post_schema(&self, subject: &str, schema: &Schema) -> Result<TBD, SchemaRegistryClientError>;
}
It looks good and aligns with going async only. Now I better read the first post, I'm not sure if performance-wise it's a good thing to have a cache only in the client, as there is some additional computation also cached currently. But we could add a second cache if needed. For example for protobuf, it would be nice to not have to run the proto resolver each time on the same cached JSON.
Yeah, I realised we could benefit from a second cache per decoder, because we actually need to parse the sechmas just once.
Also, I did split it into multiple crates. Ended with something like:
schema_registry_converter
|-- Cargo.toml
|-- (...)
|
├───schema-registry-client (http client and configurations)
| |-- lib.rs
| |-- (...)
|
├───schema-registry-serde (core types for serialization and deserialization)
| |-- lib.rs
| |-- (...)
|
├───schema-registry-serde-avro (specifics about avro)
| |-- lib.rs
| |-- (...)
|
├───schema-registry-serde-protobuf (specifics about protobuf)
| |-- lib.rs
| |-- (...)
|
└───schema-registry-serde-json (specifics about json)
|-- lib.rs
|-- (..)
@FaveroFerreira FYI I just updated some dependencies, and some required some changes to either the main code or the tests. I still think this is a good idea, it's just also a lot of work.