registry
registry copied to clipboard
SchemaGroup and enforcing uniqueness with in the group or across the SchemaRegistry
Lets say, If I want to register a schemaName with person with following schema for Nifi
{ "name": "person", "namespace": "nifi", "type": "record", "fields": [ { "name": "id", "type": "string" }, { "name": "firstName", "type": "string", "aliases": [ "first_name" ] }, { "name": "lastName", "type": "string", "aliases": [ "last_name" ] }, { "name": "email", "type": "string" }, { "name": "gender", "type": "string" }, { "name": "ipAddress", "type": "string", "aliases": [ "ip_address" ] } ] }
Should we allow users to use the same schemaName under a different group.
If I want to use schemaName "person" under schemaGroup "kafka"
{ "name": "person", "namespace": "nifi", "type": "record", "fields": [ { "name": "id", "type": "string" }, { "name": "firstName", "type": "string", "aliases": [ "first_name" ] }, { "name": "lastName", "type": "string", "aliases": [ "last_name" ] }, { "name": "email", "type": "string" }, { "name": "gender", "type": "string" }, }
The above request comes back as success but I don't see new schema getting registered.
cc @satishd
Currently, schema name(in schema metadata) should be unique across the registry irrespective of the group. When a schema metadata with the given schema name is already given then it returns the earlier registered schema metadata. When you try to add a schema with the existing schema metadata then it will be added as new version of the existing schema and it returns the schema version id.
Adding an API as part of #77 to throw an error when you try to register new schema metadata with the existing name.
Having uniqueness with schema name and group gives separation of schemas at group level. We have started with that separation but I guess it was decided users can have naming convention like "group.name" as part of schema metadata name and name can be unique in a registry cluster.
+1 with the current abstraction ~~on having separation~~. I would like to hear opinions on whether there are any usecases in which this abstraction is not that appropriate for them.
@michaelandrepearce any thoughts on this matter
+1 One main point is that the numerical ids generated and used in serialised data must remain globally unique (as these globally identify the schema once registered irrespective of group or name).
Having the schema name uniqueness I think this is sensible, as essentially allows you to have different sub-units within an organisation to become self managing without the need for either setting up seperate repos or having each sub-unit having collisions.