zingg icon indicating copy to clipboard operation
zingg copied to clipboard

In place of `fieldDefinitions`, support avro schema, which is a more comprehensive way to describe data

Open knguyen1 opened this issue 10 months ago • 0 comments

Is your feature request related to a problem? Please describe. Avro is described here in this documentation: https://avro.apache.org/docs/1.10.2/idl.html#minutiae_annotations You can use java-style decorators to add details to your fields.

This is an example of an avro schema.

@namespace("com.zingg.common.schema")
protocol Sample {
    record MySampleRecord {
        int @matchType("DONT_USE") id;
        string @aliases(["FirstName"]) @matchType("FUZZY") firstname;
        string @aliases(["LastName"]) @matchType("FUZZY") lastname;
    }
}

It compiles to a json:

{
  "type" : "record",
  "name" : "MySampleRecord",
  "namespace" : "com.zingg.common.schema",
  "fields" : [ {
    "name" : "id",
    "type" : "int",
    "matchType" : "DONT_USE"
  }, {
    "name" : "firstname",
    "type" : "string",
    "aliases" : [ "FirstName" ],
    "matchType" : "FUZZY"
  }, {
    "name" : "lastname",
    "type" : "string",
    "aliases" : [ "LastName" ],
    "matchType" : "FUZZY"
  } ]
}

It gets deployed to confluent schema registry and you can retrieve it with a simple curl:

$ curl http://schema-registry/subjects/{SCHEMA_NAME}/versions/latest

Describe the solution you'd like In modern software design, often, data contracts/things that describe data are stored in schema registry. Instead of fieldDefinitions in the zingg conf let us reference a schema-registry url and schema name. This helps us centralize data descriptions in one place, and not have to re-define in another place.

{
  "fieldDefinitions": {
    "schemaRegistry": "http://schema-registry.my.domain",
    "schemaName": "MySampleRecord",
    "version": "latest"
  }
}

Describe alternatives you've considered Alternative is to use POCO classes defined in java... Glue schema registry etc. But avro schema is the cleanest solution.

Additional context

Since schema is stored in the registry, there is no need to repeat this information in the conf. Here's the docker image to host your own schema registry. https://hub.docker.com/r/confluentinc/cp-schema-registry

knguyen1 avatar Apr 17 '24 17:04 knguyen1