schema_registry_converter [Proposal] Enhancements to improve DevEX and align crate with Confluent Schema Registry java library

Feature Description

I'd like to propose a few changes aimed at improving the developer experience and ensuring the crate is in line with the Java library standards.

These changes are intended to make development and testing more straightforward and enhance the overall usability of the crate.

Here are the proposed modifications:

Introduce `SchemaRegistryClient` trait:

Using a trait we can significantly facilitate testing in applications that uses this crate. It will be easier to mock and test components that depend on it.

Refactor `SrSettingsBuilder`:

Rename to SchemaRegistryClientConfig and make methods return Self. This will improve the configuration setup in applications that use this crate.

Refactor `SrSettings`:

Rename it to CachedSchemaRegistryClient, and implement SchemaRegistryClient for it.
Move caches from AvroEncoder and AvroDecoder to the new Client implementation. This will make us able to share the cached schemas between the encoding and decoding process.

These changes will make this crate consistent with the Confluent Schema Registry library, and improve the code structure.

Use dynamic dispatch in `AvroEncoder` and `AvroDecoder`.

This will enable users to provide any implementation of SchemaRegistryClient to encoders and decoders

Feb 28 '24 17:02 FaveroFerreira

I understand that these changes are not trivial. @gklijs, if you are Okay with it, I can work on it myself and open a PR.

Feb 28 '24 17:02 FaveroFerreira

It's indeed not trivial. I'm also wondering what would be expected to be in the SchemaRegistryClient trait? Furthermore I wonder if big changes are being made, if we should not have a separate schema_registry_client module/crate for the HTTP calls to schema registry, implementing the whole API (generated), and separate modules/crates for the different types. I'm also wondering when we are going to make changes, if we should still support blocking, or only keep async.

Feb 29 '24 08:02 gklijs

I wonder if big changes are being made, if we should not have a separate schema_registry_client module/crate for the HTTP calls to schema registry, implementing the whole API (generated), and separate modules/crates for the different types.

I feel that spliting the code in multiple crates is the right way to go. Imho, the schema-registry-client should just be responsible for making HTTP calls to schema registry and caching retreived schemas.

Additional functionalities that uses schema-registry-client, should be provided by a separate crate in the workspace.

I'm also wondering when we are going to make changes, if we should still support blocking, or only keep async.

Tough question... I dont know. Supporting both async and sync code in Rust is not easy. Myself personaly have never used the blocking version of this crate, but I can't speak for everyone.

Feb 29 '24 12:02 FaveroFerreira

I'm also wondering what would be expected to be in the SchemaRegistryClient trait?

I think we should start by providing a method for each call that we support with the current enum SrCall, and build from there.

Something like:

#[async_trait]
pub trait SchemaRegistryClient {
    async fn get_by_id(&self, id: u32) -> Result<TBD, SchemaRegistryClientError>;
    async fn get_latest_by_subject(&self, subject: &str) -> Result<TBD, SchemaRegistryClientError>;
    async fn get_by_subject_and_version(&self, subject: &str, version: u32) -> Result<TBD, SchemaRegistryClientError>;
    async fn post_schema(&self, subject: &str, schema: &Schema) -> Result<TBD, SchemaRegistryClientError>;
}

Feb 29 '24 12:02 FaveroFerreira

It looks good and aligns with going async only. Now I better read the first post, I'm not sure if performance-wise it's a good thing to have a cache only in the client, as there is some additional computation also cached currently. But we could add a second cache if needed. For example for protobuf, it would be nice to not have to run the proto resolver each time on the same cached JSON.

Mar 01 '24 14:03 gklijs

Yeah, I realised we could benefit from a second cache per decoder, because we actually need to parse the sechmas just once.

Also, I did split it into multiple crates. Ended with something like:

schema_registry_converter
|-- Cargo.toml
|-- (...)
|
├───schema-registry-client (http client and configurations)
|   |-- lib.rs
|   |-- (...)
|
├───schema-registry-serde (core types for serialization and deserialization)
|   |-- lib.rs
|   |-- (...)
|
├───schema-registry-serde-avro (specifics about avro)
|   |-- lib.rs
|   |-- (...)
|
├───schema-registry-serde-protobuf (specifics about protobuf)
|   |-- lib.rs
|   |-- (...)
|
└───schema-registry-serde-json (specifics about json)
    |-- lib.rs
    |-- (..)

Mar 01 '24 14:03 FaveroFerreira

@FaveroFerreira FYI I just updated some dependencies, and some required some changes to either the main code or the tests. I still think this is a good idea, it's just also a lot of work.

Jun 16 '24 08:06 gklijs

schema_registry_converter schema_registry_converter copied to clipboard

[Proposal] Enhancements to improve DevEX and align crate with Confluent Schema Registry java library

Feature Description

Introduce SchemaRegistryClient trait:

Refactor SrSettingsBuilder:

Refactor SrSettings:

Use dynamic dispatch in AvroEncoder and AvroDecoder.

schema_registry_converter
schema_registry_converter copied to clipboard

Introduce `SchemaRegistryClient` trait:

Refactor `SrSettingsBuilder`:

Refactor `SrSettings`:

Use dynamic dispatch in `AvroEncoder` and `AvroDecoder`.