registry icon indicating copy to clipboard operation
registry copied to clipboard

Add a Confluent Schema Registry -> Registry Schema Import

Open michaelandrepearce opened this issue 7 years ago • 15 comments

Add an import tool, to allow import with identical global id's for allowing migration.

michaelandrepearce avatar Jun 26 '17 09:06 michaelandrepearce

@satishd to make this work a bit nicer so it doesn't have to be only done the once, could be re-imported a few times. i guess what would be good, is if there is an easy way to avoid any collisions for a period, what i was thinking is maybe some way of setting the auto created schema version ids (global id) to some configurable starting number e.g. a user may want to start at 5000 as he/she knows that in confluent they have 1000 schema's and need a small grace period.

WDYT is this easily doable? I prob need some direction here if it is.

michaelandrepearce avatar Jun 28 '17 05:06 michaelandrepearce

@michaelandrepearce Right, it does not need to be done only once, it may be a continuous exercise till the migration is complete. @harshach was also having similar thoughts of assigning ranges etc, which looks to be doable. We need to comeup with an end to end schema migration plan(covering most of the scenarios) from one or more registries into this registry. This covers creating

  • schema metadata (subjects in confluent jargon)
  • versions of schemas for registered schema metadata without any collisions while the migration is being done.

satishd avatar Jun 28 '17 07:06 satishd

For basic migration the immediate need (not async syncing of two disparate clusters) , my take on the bits you flag.

Schema Metadata

  • Group = config configurable
  • Schema Name = comes from remote schema repo
  • id = auto gen'd as there is not group concept in confluent this does not matter.
  • Compatibly = is taken from remote schema repo.

Versions

  • Version = this isn't actually used by serdes apart from order of schema, as such i think we can let this auto gen.
  • Global ID = this is very important and thus the range idea, we just make the receiving cluster be ranged, and ensure its range is not in the 0-X range, as 0-x would have been pre taken by the existing repo.
  • Compatibility is not checked on import (whilst it is set on the broker), this is to avoid issues, we have to assume its been pre-checked, as historically during phases it could have had different compatibility settings.

michaelandrepearce avatar Jun 28 '17 11:06 michaelandrepearce

@michaelandrepearce I have put up a wiki and you can update that with whatever we discuss and finalize here. This can be used as reference later instead of going through this issue comments.

satishd avatar Jun 29 '17 04:06 satishd

Compatibility is not checked on import (whilst it is set on the broker), this is to avoid issues, we have to assume its been pre-checked, as historically during phases it could have had different compatibility settings.

Do you mean to say compatibility between different versions can be updated in between breaking with earlier versions? I thought it validates all the versions with the updated compatibility level.

satishd avatar Jun 29 '17 04:06 satishd

It is possible that today for a given subject (in confluent parlance) today the compatibility setting maybe "backwards", though at some point in history it maybe set to "none" eg Version 1,2,3,4 could have been added during a time it was set to "none" then after 4 it was set to "backward" at which time 5,6,7 would be backwards compatible with 4 but maybe not 1,2,3.

Like wise we only can get today's state of this setting so for any subject we have no guarentee of historic compatibility check only that which it is set to today and we honour going forwards

michaelandrepearce avatar Jun 29 '17 06:06 michaelandrepearce

Fyi we actually have this case

michaelandrepearce avatar Jun 29 '17 06:06 michaelandrepearce

OK. We do not support updating compatibility for now. Currently new version of schema validates compatibility against all the earlier versions. We may need to do add this if it is frequent usecase. IMHO, compatibility should always be set as backward or full/both depending on usecases.

satishd avatar Jun 29 '17 06:06 satishd

So in confluent they have these two settings: ValidateAll, ValidateLatest. The original behaviour and default still today in Confluents is ValidateLatest.

ValidateAll was only added last year Confluent version 3.1.0 https://github.com/confluentinc/schema-registry/pull/415

As such i would suggest we need to support ValidateLatest as it is the most prevalent setting due to history.

Like wise because of this and the fact compatibility setting is changeable in confluent, on the schema repo migration import flow, the compatibility check should be ignored.

RE:

compatibility should always be set as backward or full/both depending on usecases

In the world of many orgs all having different needs, opinions and reasons, i think it is best to simply support the options, and make it for those to configure and decide.

Within my org the whole what should be our global policy was a hot topic to agree on (we're now set to backward), and yet still we have a few departments wishing to have slightly different setup.

michaelandrepearce avatar Jun 29 '17 06:06 michaelandrepearce

Filed #171 for the validateAll and validateLatest to support Confluent SR migration.

satishd avatar Jul 03 '17 05:07 satishd

re

@harshach was also having similar thoughts of assigning ranges etc, which looks to be doable.

@harshach are you doing this work? Shall we track that feature separately?

michaelandrepearce avatar Jul 03 '17 16:07 michaelandrepearce

@satishd see google group, just want to discuss a release cadence, rather than having big feature releases, i want to propose a monthly release. For example i can see us holding up the 3.1 release due to this feature, which to deliver this feature we would have already added lots of sub features that are ready to be used, even if the larger piece is not there yet.

michaelandrepearce avatar Jul 04 '17 05:07 michaelandrepearce

Re ranges, for the immediate interim of just getting confluent repo ingested without collision.

Currently for nextId it relies on the underlying db's auto increment feature and null id is set to achieve this on createSchema.

As such for an interim the following (or similar) could just be done in the setup scripts for the db to achieve having the Hortonworks Registry start at some higher number.

MYSQL - ALTER TABLE schema_version_info AUTO_INCREMENT=10000; POSTGRES - ALTER SEQUENCE schema_version_info_id_seq RESTART WITH 10000

michaelandrepearce avatar Jul 05 '17 08:07 michaelandrepearce

@satishd @harshach forgot to ping you for the above comment, any thoughts?

michaelandrepearce avatar Jul 10 '17 07:07 michaelandrepearce

@michaelandrepearce @harshach I am fine with this as temporary solution till we finalize on better solution if possible.

satishd avatar Jul 10 '17 09:07 satishd