airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

Allow automatically migrating connector config schemas & existing connectors

Open sherifnada opened this issue 3 years ago • 5 comments

Tell us about the problem you're trying to solve

Sometimes schemas need to be changed in a backwards incompatible way but which does not require user input. For example, if I need to rename a field in the connector spec passwrd to password then the user doesn't need to know anything about this -- we just need to update their connector configuration in the DB to rename passwrd to password.

There is no easy way to do this right now in Airbyte Cloud or OSS.

Airbyte allows writing migrations which run when the user upgrades their Airbyte deployment to a new version (see examples here). However, because connector versions can be upgraded separately from Airbyte (platform) versions, this migration mechanism is not a reliable way out of the box to solve this problem. For example, if the Facebook connector v2 needs a custom written migration to be moved away from v1, then there is nothing in the extisting migration mechanisms to enforce that the user can only upgrade to v2 in tandem with upgrading their Airbyte platform version.

Ideally, the outcome of this project is two-fold:

  1. there exists a way of solving connector migrations within the Airbyte platform
  2. our CI verifies that if a connector's spec contains a backwards breaking change, that there is an accompanying migration to update all this connector's configs in the Airbyte Platform DB to match the new schema

Some things to consider are upgrading/downgrading a connector's spec, especially given that a user could set an arbitrary connector version (not only can they go from say 1.0.0 to 2.0.0 but they can also go from 2.0.0 to 1.0.0 or to my-random-image-version).

I'm going to riff some options for solving this problem. I do not endorse any of them right this second, I'm sure there are more ideas we can think of.

Option 1: Use major semver bumps to indicate backwards breaking changes, and only allow major bumps on platform upgrades

Pros:

  1. Allows us to centralize the logic for migrations in one place, the platform's existing migration mechanism

Cons:

  1. Seems like you could "bypass" this restriction by going image:v0.1.1 --> image:custom-version --> image:v1.0.0
  2. Migrations have to be written in a specific language (Java)
  3. Couples connector upgrades with platform upgrades which is not great in case you need to revert one or the other.

Option 2: don't explicitly have any migration logic, but whenever a user changes a connector version in the UI, if the new version spec is different than the old version, disable all their connections using this connector and ask them to upgrade the version

just seems like this is too duct-tapey of a solution. There might be hundreds of such connectors. Also doesn't work for API based workflows. I'm gonna go ahead and say this is a non-starter.

Option 3: add something to the Airbyte protocol which handles migrations This is maybe the most realistic (?) option. We could add a new command that every connector can respond to, migrate-config which takes in a JSON config matching some spec that was known to this connector in the past, and outputting a version of this config which matches the current connector version's spec. This would involve some non-trivial additions to the CDK. Then, whenever you upgrade a connector through the UI or API, the platform finds all configs belonging to this connector in the config DB, then runs them against this migration process.

Pros:

  1. Migrations are completely contained within connectors
  2. Large parts of this could be made easier in the CDK
  3. Could be tested via SAT/DAT

Cons:

  1. Upon every version upgrade, it doesn't make it easier for the platform to know which connectors have had their specs changed and are in need of a migration. The platform would need to manually check then manually run all those integrations which could take a long time.

sherifnada avatar Aug 13 '21 14:08 sherifnada

I would be curious if setting airbyte protocol version for a connector version would be part of this. That and compatible platform versions.

cgardens avatar Aug 14 '21 23:08 cgardens

@sherifnada how often do we have backwards incompatible changes for connectors?

cgardens avatar Mar 22 '22 16:03 cgardens

#1 seems potentially tolerable as an intermediate workaround imo. if it just comes down to priorities, does it get us through Q2?

cgardens avatar Mar 22 '22 16:03 cgardens

Update from today's sync:

scoping:

  • we don't need to support custom connectors or connectors with custom version right away
  • similarly, it's probably fine not to support downgrading right away

solution we are leaning towards (we haven't set anything in stone yet):

  • Use sermver major version bumps to indicate a migration is needed
  • we'll need to persist the connector version with the config in the db so we know how to upgrade it - we can't just rely on the docker image tag anymore
  • on OSS, migrations can be triggered manually on a major bump.
    • this is relatively simple as we can just spin up a docker image to run the migration on all the connections that need to be updated
  • on cloud: we'd want a separate service that would upgrade connector versions and their configs.
    • We can't use a single docker container to upgrade all connectors at once because it would break the isolation between the customers
    • the migration service would be owned by the connectors team to allow them to upgrade connectors without needing to upgrade the platform version.

There's still an open question about where the migrations will be defined. Are they going to be part of the spec?

Link to brainstorm doc for posterity: https://docs.google.com/document/d/1KvcBzpZayqwiCGvSdLYpW3Bgi1D5KiwEaBdjaaP1KeI/edit

girarda avatar Mar 24 '22 22:03 girarda

OSS Platform Grooming Notes:

  • Understand the priority from the connectors team?
  • Not ready for work. Need to brainstorm what a solution would actually look like. Probably a brainstorm with Lake & Alex?
  • Action Item: @lmossman && @cgardens to follow up with @sherifnada on priority.

@sherifnada what is the priority for this? If Q2 it seems like the next step would be starting to brainstorm an approach? or is there already a proposal.

cgardens avatar May 10 '22 17:05 cgardens

Renaming this issue to be the "Config Migrations" epic - Option 3 in the description above

evantahler avatar Sep 28 '22 18:09 evantahler