spring-cloud-schema-registry icon indicating copy to clipboard operation
spring-cloud-schema-registry copied to clipboard

Referencing other schemas

Open yeralin opened this issue 5 years ago • 6 comments

AVRO supports multi-schema reference i.e.:

{
    "type": "record",
    "namespace": "test",
    "name": "sub",
    "fields": [
        {
            "name": "state",
            "type": "string"
        }
    ]
}
{
    "type": "record",
    "namespace": "test",
    "name": "main",
    "fields": [
        {
            "name": "sub",
            "type": "test.sub"
        }
    ]
}

I deployed spring-cloud-schema-registry, and was successful at POSTing test.sub schema, but when I try to submit test.main I receive: Invalid Schema: Undefined name: "test.sub"

Which is expected, since there is probably no internal schema resolution (could be a cool feature btw). As for now, is it possible to set some header/flag/etc to avoid this exception?

UPD: I looked at the code, it doesn't seem like I can surpass the validation.

yeralin avatar Feb 26 '20 16:02 yeralin

org.apache.avro.Schema says

A parser for JSON-format schemas. Each named schema parsed with a parser is added to the names known to the parser so that subsequently parsed schemas may refer to it by name.

There is also a method: https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/Schema.Parser.html#addTypes(java.util.Map)

I see two ways of solving this:

1. Create a map of Parsers based on namespaces For ex., in my case when I submitted test.sub, it would have created

PARSERS_MAP.put("test", new Parser());
PARSERS_MAP.get("test").parse(test_sub_definition);

then for the test.main, simply PARSERS_MAP.get("test").parse(test_main_definition);.

Drawback: the parsers map should be populated at server boot, if we have a lot of schemas, the boot time would be much longer.


2. Extract any non-avro types from a definition OR catch SchemaParseException with Undefined name, and try to locate it in the repository.

If it is found, use above-mentioned addTypes, otherwise throw InvalidSchemaException.

UPD: location of a schema might not be possible since we can only search based on the subject where schema name might differ from the subject it was persisted under.

UPD2: That led me to a third solution:


3. If a user is POSTing a schema that references another schema that was already persisted, one should pass a specific header with the subject name(s) of a referenced schema.

For ex., I persisted my test.sub schema under A subject. Then when I try to persist my test.main schema, I should pass some header: Schema-Reference: A; (possible to post multiple comma separated subjects). or with version Schema-Reference: A+v2;

On the backend side, we would have something like:

Parser avroParser = new Parser();
for (String subj : schemaReferenceHeader) {
    schema = schemaRepo.find(subj);
    avroParser.add(schema);
}
avroParser.parse(definition);

This seems not to have any drawbacks.

yeralin avatar Feb 26 '20 16:02 yeralin

@sobychacko what do you think about this? And my PR

yeralin avatar Feb 27 '20 16:02 yeralin

Hi @yeralin , thanks for exploring the schema cross referencing use cases!

Suggested third approach to use using http header do express the Schema dependencies, feels like a plausible workaround. But a consistent solution would require extending Schema model with an additional dependencies field. Later will explicitly cross reference the dependent Schemas, using the subject name, version and format as a reference.

In you PR you already extend the Schema model with a List<Schema> references. But i guess this field should be of type List<SchemaReferece> or similar? The goals is to register related Schemas separately and references them not contain them?

Also given that the Schema model is extended with a schema references field, what is the role of the reference header?

tzolov avatar Mar 16 '20 11:03 tzolov

@yeralin: Do you have any thoughts to share here? Let us know what you think.

sabbyanandan avatar Mar 24 '20 15:03 sabbyanandan

Hey guys, I am really sorry. Got stuck in Peru due to the gvmnt closing its borders lol. Trying to get back to the US.

Will try to work on this ASAP.

yeralin avatar Mar 24 '20 15:03 yeralin

@yeralin: Thank you for your contributions so far. Please take your time, and be safe.

sabbyanandan avatar Mar 24 '20 15:03 sabbyanandan