registry icon indicating copy to clipboard operation
registry copied to clipboard

Feature request: allow us to reference schemas within other schemas

Open dmsolow opened this issue 6 years ago • 6 comments

This would be enormously helpful. Let's say I have many types of transactions that involve users, and a user looks like this:

{
 "namespace": "example.avro",
 "type": "record",
 "name": "user",
 "fields": [
  {
   "name": "name",
   "type": "string"
  },
  {
   "name": "id",
   "type": "int"
  }
 ]
}

And I have many schemas that involve users as a category. For example:

{
 "namespace": "example.avro",
 "type": "record",
 "name": "UserFriends",
 "fields": [
  {
   "name": "user",
   "type": {
 "namespace": "example.avro",
 "type": "record",
 "name": "user",
 "fields": [
  {
   "name": "name",
   "type": "string"
  },
  {
   "name": "id",
   "type": "int"
  }
 ]
}  },
  {
   "name": "friends",
   "type": {"type": "array", "items": {"type": "example.avro.user"}}
  }
 ]
}

And:

{
 "namespace": "example.avro",
 "type": "record",
 "name": "Purchase",
 "fields": [
  {
   "name": "customer",
   "type": {
 "namespace": "example.avro",
 "type": "record",
 "name": "user",
 "fields": [
  {
   "name": "name",
   "type": "string"
  },
  {
   "name": "id",
   "type": "int"
  }
 ]
}  },
  {
   "name": "productId",
   "type": "int"
  }
 ]
}

It would be wonderful if I could create a distinct schema registry entry for the user schema, and just reference that schema (and the version) in other schemas. It would be even cooler to reference the latest version of a schema.

But even just referencing a schema ID and a version number would be a big help. As it is, it's quite complicated to keep track of this kind of situation. When I update user by adding a field, I have to update all the schemas that have users.

dmsolow avatar Oct 05 '18 01:10 dmsolow

@dmsolow Including existing schemas is supported with includeSchemas attribute in schemas. I hope it solves the scenario you mentioned in earlier comment.

satishd avatar Oct 16 '18 09:10 satishd

@satishd I thought it might solve my scenario, but I'm confused as to when the referenced schemas are resolved.

I made two test schemas, with schema A referencing the other schema B using includeSchemas key. However when I retrieve schema A from the registry, the returned schema does not contain the type definition that was referenced from schema B. Instead it includes the includeSchemas key.

I guess I figured that the schema registry would resolve the references by itself? What am I missing here?

dmsolow avatar Oct 16 '18 16:10 dmsolow

@dmsolow Existing SchemaResgitryClient APIs does not return resultant schemas.. Avro deserializer internally resolves schemas with the mentioned attribute here. You need to do something similar to build resultant schemas using AvroSchemaResolver#resolveShema.

satishd avatar Oct 16 '18 17:10 satishd

@satishd Okay, that's unfortunate. From my perspective an endpoint that returned the fully resolved schema would be very useful. I'll consider my options from here. I can probably still use the schema-registry, but it will require a bit of legwork since I'm not really planning to use java for serializers/deserializers.

dmsolow avatar Oct 16 '18 17:10 dmsolow

@satishd I'm finding this feature to be kind of lacking in 0.5.4

It would be nice, for example, to have one "utils" schema that's a union of record types, and then include the "utils" schema in another schema, and reference one or more of the record types. However this does not work currently.

dmsolow avatar Oct 17 '18 16:10 dmsolow

I'm also having trouble adding a new version of a schema that takes advantage of this feature, it seems fairly broken all around. For example if I create a schema that references one type form another schema, and then try to add a new version that references another type from the same schema, it fails

dmsolow avatar Oct 17 '18 17:10 dmsolow