avrora icon indicating copy to clipboard operation
avrora copied to clipboard

unnamed_type error with primary type schema

Open adjivas opened this issue 2 years ago • 11 comments

Hello,

Do is it possible to register a primary type' schema with Avrora?

A primary type schema is by example that:

{
    "type": "string",
    "name": "myvalue"
}

When I try to save it with the register_schema_by_name function:

Avrora.Utils.Registrar.register_schema_by_name("myschema-value")

This error happen:

13:02:04.414 [debug] reading schema `myschema-value` from the file /app/priv/schemas/myproject/myschema-value.avsc
{:error, :unnamed_type}

adjivas avatar Jun 07 '22 13:06 adjivas

Hi @adjivas 👋🏼

Initially Avrora was designed to work with Record schemas which allows you to have nested schemas of complex types. But there was no reason to register a primary type (also can't find that definition in Avro specs) yet.

Probably that type can be registered as a part of another Record schema.

Strech avatar Jun 11 '22 20:06 Strech

Hi @Strech, sorry for the late reply. There you can found a official list and a documented example of Primitive Types from the Avro specs:

{"type": "string"}

Primitive Types exists to the side of Complex Types.

adjivas avatar Aug 12 '22 13:08 adjivas

Sorry for a long pause, I will take a look @adjivas and drop a message of what we can do about it

Strech avatar Sep 06 '22 12:09 Strech

I've re-read the documentation (also check the erlavro source code). And the specification says https://avro.apache.org/docs/1.11.1/specification/#primitive-types

Primitive types have no specified attributes. Primitive type names are also defined type names. Thus, for example, the schema “string” is equivalent to: {"type": "string"}

Then I checked the erlavro and their parsing mechanism

iex(1)> :avro_json_decoder.decode_schema(~s({"type":"string","name":"MyString"}), allow_bad_references: true)   
{:avro_primitive_type, "string", []}

And the result is exactly as stated in the specification, primitive types can't have any attributes thus I don't see any in the output.

Here is another answer on that topic: https://stackoverflow.com/questions/66210730/aliases-for-primitive-types-in-avro

TL;DR You can't reference primitive type by alias (our new name perse)

If you have further question, feel free to drop them here

Strech avatar Sep 13 '22 21:09 Strech

It is true that a primitive type cannot have a name, but unnamed types can be used in registries, the clearest example is a union (see https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html#multiple-event-types-in-the-same-topic).

If a registry has a schema that, instead of being a record, is a union (that is, in JSON, an array of other types, typically records), and Avrora is asked to decode a message marked with that id, then it will fail to transform the downloaded JSON into a valid schema to decode, instead returning {:error, :unnamed_type}.

@Strech do you agree that this should be opened back?

I might give it a try in that case.

rewritten avatar Oct 03 '23 16:10 rewritten

@rewritten could you please provide an example schema you've mentioned? Maybe it's indeed an issue. It also could be that it's just a case so rare/broken that it makes no sense to put effort.

Strech avatar Oct 04 '23 18:10 Strech

Sorry I did not notice the response (not my main work account).

Another very common case is decoding message keys. Usually they are strings or numbers, but still avro-encoded in the message payload. In that case the schema will be just "string".

In case of unions, the schema is an array:

[
  {"type": "record", "name": "foo", "fields": [{"name": "foo_field", "type": "string"}]},
  {"type": "record", "name": "bar", "fields": [{"name": "bar_field", "type": "string"}]}
]

Currently, only schemas that are JSON objects with a "name" field at their root are supported (i.e., records, enums, and surprisingly fixeds), instead unions and basic types fail.

rewritten avatar Dec 04 '23 12:12 rewritten

Thanks for the additional details. I still need to wrap my head around it, but it feels like an issue now. It would be cool to have a basic failing example, like - this is the schema, here is it in the file, here is how I encode/decode, here is an error.

Because then I can iterate over it until we have a working solution. I will come to this issue right after decoding the logical types issue.

Strech avatar Dec 04 '23 18:12 Strech

I have wrapped up a stand-alone script that will show the issue - it starts by showing three types of schema and how they all work with simple decoders from :erlavro. Then it starts Avrora and populates the memory registry with the schema for a record, showing that it has the same behavior. It finally tries to do the same with a union and with a basic type, without succeding.

https://gist.github.com/rewritten/2533573332d1de1e4b568def9c757c42

Let me know if I can help more.

rewritten avatar Dec 05 '23 17:12 rewritten

I cannot provide an example that includes an actual Confluent registry (for obvious reasons), you will have to trust me in that sense.

rewritten avatar Dec 05 '23 17:12 rewritten

Docs:

  1. https://www.confluent.io/blog/multiple-event-types-in-the-same-kafka-topic/
  2. https://docs.confluent.io/platform/current/schema-registry/develop/api.html#post--subjects-(string-%20subject)-versions

Examples:

["int", "string"]
{
  "schema": "[\"int\",\"string\"]",
  "schemaType": "AVRO",
  "references": []
}

and with a custom type reference

["int", "io.Payment"]
{
  "schema": "[\"int\",\"io.Payment\"]",
  "schemaType": "AVRO",
  "references": [
    {
      "name": "io.Payment",
      "subject": "io.Payment",
      "version": 1
    }
  ]
}

The reference name (anchor) above is 1:1 with the schema name (also the subject), but it could be different.

These examples require Avrora to:

  1. Register untyped schema with a name that would be registered in the schema registry or generate a random name or (check untyped in erlavro)
  2. Have an ability to resolve references differently while parsing/registering schemas (see references new field in Schema Registry)
  3. Part of the reading resolution already exists, but requires better testing

As a bonus, Avrora could fix that by the controlled registration before the schema

{
 "type": "record",
 "namespace": "io.confluent.examples.avro",
 "name": "AllTypes",
 "fields": [
   {
     "name": "oneof_type",
     "type": [
       "io.confluent.examples.avro.Customer",
       "io.confluent.examples.avro.Product",
       "io.confluent.examples.avro.Order"
     ]
   }
 ]
}

This extra level of indirection allows automatic registration of the top-level Avro schema to work properly. However, unlike Protobuf, with Avro, the referenced schemas still need to be registered manually beforehand, as the Avro object does not have the necessary information to allow referenced schemas to be automatically registered.

Strech avatar Apr 18 '24 12:04 Strech