Better Migrations

This proposal aims to improve the way our migrations work.

What's wrong with the current migrations?

tldr;

There's a big problem with migration application ordering and that it can accidentally change over time.
There's no way for SDK users to do 'store'-level migrations.
There's not enough leverage to choose which order migrations are applied in.
They are conceptually kind of a mess and it makes the code hard/scary.

Migration application ordering danger

At the moment our migrations are split into three categories:

Store-level migrations, which can operate on the entire store at once.
Record-level migrations, which can operate on a single record at a time and cannot create or delete records.
Record-subtype-level migrations, which can operate on a single record at a time and cannot create or delete records.

The way the migrations are applied when encountering a store snapshot with an older schema is as follows:

Any pending store-level migrations are applied first.
Then, for each record in the store, any pending record-level migrations are applied and any pending record-subtype-level migrations are applied sequentially. Here's the relevant code: https://github.com/tldraw/tldraw/blob/f6e7793f49d2905982fdf6879e1ab1cc4393ef2f/packages/store/src/lib/StoreSchema.ts#L192

This is all well and good if there's only one migration applied every time, or if there are several but they all act independently of each other.

However, if there are several migrations that depend on each other, then the order in which they are applied matters.

To illustrate when this might occur, let's use the actual situation that motivated this proposal: migrating arrow bindings to use a generic bindings API.

We require a store-level migration to pluck out the binding information from the existing arrows and create new standalone binding records.
There were already migrations that operated on the arrow binding data, e.g. to add a isPrecise field.
In the current system, if someone was updating from a version before the isPrecise field existed, then the store-level migration would be applied first, but it would fail because the isPrecise field wasn't added yet. Or if we coded it defensively then the record-level migration that added the isPrecise field would fail because it would try to add a isPrecise field to an object which no longer accepts it.
We can't simply switch around the order in which things are applied because obviously it would still be susceptible to the same issue.

We might well already have introduced bugs due to this.

No way for SDK users to do store-level migrations

SDK users are only able to add migrations for particular shape types at the moment. But we also allow them to stick any old data in meta objects and we're soon gonna allow them to create custom binding types, and we should also probably be allowing them to create custom asset types... there's currently no way for them to write migrations on non-shape types or migrations that need to operate on the entire store at once.

Not enough leverage to choose which order migrations are applied in

As already illustrated, we need a way to specify more precisely which order migrations should be applied in, while upholding the constraint that SDK users can add their own parallel set(s) of migrations. They might even be using 3rd party add-ons for tldraw which also expose migrations that need to be mixed in, and whose ordering will be affected by when packages are upgraded and how that interacts with things the end SDK user has added.

Conceptual mess

The whole type/subtype business is kinda 🤮

Proposal

type Migration = StoreMigration | RecordMigration
type MigrationId = {
  sequenceId: string
  versionId: string
}
// store migrations operate on the entire store at once
// it does not support 'down' migrations because that would be prohibitively expensive
// (and probably not even very useful) for the main use case i.e. allowing sync
// server backwards compatibility
type StoreMigration = {
  scope: 'store'
  // if this migration needs to run after another migration from a different sequence, specify it here
  dependsOn?: MigrationId[]
  // versionId is human-readable e.g. 'add_isPrecise_to_arrow_bindings'
  // it needs to be unique to a particular 'sequence' of migrations (see below)
  versionId: string
  up(store: StoreSnapshot): StoreSnapshot
  // no down migrations for store-level migrations
}
// record migrations operate on a single record at a time, they cannot create or delete records
// they are not scoped to a particular type or subtype, but are rather run for every record in the store
// and it's up to them to check the record type/subtype and decide whether to do anything
type RecordMigration = {
  scope: 'record'
  dependsOn?: MigrationId[]
  versionId: string
  up(record: UnknownRecord): UnknownRecord
  // allow down migrations
  down(record: UnknownRecord): UnknownRecord
}

// tldraw extensions that need to do migrations would provide a 'MigrationSequences' object
type MigrationSequence = {
  // the sequence ID uniquely identifies a sequence of migrations. it should
  // be human readable and ideally namespaced e.g. `com.tldraw/TLArrowShape`
  id: string
  migrations: Migration[]
}

// The serialized schema might thereafter look like
const schema = {
  'com.tldraw/TLShape': {
    lastVersion: 'add_pageId_field'
  },
  'com.tldraw/TLArrowShape': {
    lastVersion: 'add_isPrecise_field'
  },
  // It's fine to omit records that don't have any migrations.
  // In fact these sequences don't need to correspond 1:1 with record types. They are arbitrary.
  ...
}

// when instantiating a schema we'd pass in a list of MigrationSequences
const schema = new StoreSchema({
  // these don't have any migration/version info any more,
  recordTypes,
  migrations: [
    {
      // In reality we'd probably want to consolidate all of our tlschema migrations into one sequence
      // for the sake of simplicity, since then we wouldn't need to specify any dependencies ourselves.
      id: 'com.tldraw/core',
      migrations: [
        {
          scope: 'store',
          versionId: 'extract_arrow_bindings',
          up(store) {
            // ...
          }
        },
        {
          scope: 'record',
          versionId: 'add_isPrecise_field',
          up(record) {
            // ...
          },
          down(record) {
            // ...
          }
        },
        ...
      ]
    }
  ]
})

// and when people add their own extensions they'd probably create just one migration sequence now (rather than one per record type)

const schema = createTLSchema({migrations: [...defaultMigrations, {
  id: 'com.clickup/core',
  migrations: [
    {
      scope: 'record',
      versionId: 'add_clickup_taskId_field',
      // When users add their own migrations, they can specify a hard-coded dependency on some other migration. 
      // This would be useful if you have upgraded tldraw, or a 3rd party tldraw extension, and you need to make sure that the migration you're adding runs after an upstream migration.
      // This should rarely be an issue. Only if folks are messing with or reading from the core record types during their own migrations.
      dependsOn: [{ sequenceId: 'com.tldraw/core', versionId: 'extract_arrow_bindings' }],
      up(record) {
        // ...
      },
      down(record) {
        // ...
      }
    }
  ]
}]})

When applying migrations, we'd first do a topological sort to make sure any dependency relationships are satisfied, then we apply the migrations in the obvious way, one after the other.

Dec 19 '23 17:12 ds300

A bunch of assorted comments, based on things we discussed offline.

Explicit intended sequence numbering by convention

It might be a good idea to encode intended/default sequencing of our migrations in prefixes, similar to Django's conventions. Here is how it can look like:

migrations/
  001_extract_arrow_bindings.ts
  002_add_is_precise.ts

A custom sequence that includes an extra migration would then look extremely obvious:

import m001_addLabelColor from './migrations/001_add_label_color'
import m002_addIsPrecise from './migrations/002_add_is_precise'

const fooCorpSequence = {
	id: 'fooCorp.arrows',
	migrations: [m001_addLabelColor, adjustCurve, m002_addIsPrecise],
}

It's also handy if/when migrations are split into separate files.

Splitting migrations into files

Maybe split them out from the start?

schema files will be shorter&neater
we can avoid loading unneeded migrations at all (depending on the way we package etc of course)

`createTLSchema` can explicitly require a single `MigrationSequence` argument

This will guarantee that we have a well defined sequence of migrations to apply (vs implicitly ordering with dependsOn)

Explicitly specify that `dependsOn` is purely for validation?

If I get the RFC right, we rely on explicit MigrationSequences. dependsOn sounds like we also do topological sorting. Do we? If not, maybe either spelling it out explicitly, or even renaming it to checkAfter or something similar is a good idea.

From this perspective, it might also be good to add those even to our migrations (and make them required), so that custom MigrationSequences are always valid.

Custom migrations and upstream updates

What happens when we release a version that relies on a migration being applied, and people overriding migration sequences forget updating their MigrationSequences? How do we check that the latest migration that we rely on being present is added to their custom MigrationSequences?

Changing `MigrationSequence`s

Imagine the following scenario:

I extend tldraw, adding a field to arrows, it requires a migration
I run this on a custom setup with custom sync
clients report the "migration version" they are on — I suppose the head of the MigrationSequence?
I realise I made an error, change a step in the middle of the sequence, and redeploy
now old clients and my new server are out of sync but don't know about it

Do we want to handle this somehow?

Jan 23 '24 18:01 si14

Explicit intended sequence numbering by convention

This is a good suggestion

Splitting migrations into files

Yeah this makes sense too

createTLSchema can explicitly require a single MigrationSequence argument Explicitly specify that dependsOn is purely for validation?

I've been thinking along the same lines.

What happens when we release a version that relies on a migration being applied, and people overriding migration sequences forget updating their MigrationSequences? How do we check that the latest migration that we rely on being present is added to their custom MigrationSequences?

I think we need to separate defining the migrations from ordering them.

tldraw, and 3rd party plugins, will export a migrations sequence that contains an implicit ordering as well as the migration functions etc, just like the MigrationSequence type above.

Then users calling createTLSchema will pass in all those migration sequences, along with an explicit ordering for how they should be applied.

import { defaultMigrations, createTLSchema } from '@tldraw/tldraw'
import { myMigrations } from './migrations'

const schema = createTLSchema({
  migrations: {
    sequences: [defaultMigrations, myMigrations],
    order: [
      'com.tldraw/001_foo',
      'com.tldraw/002_bar',
      'com.tldraw/003_baz',
      'com.clickup/001_foo',
      'com.clickup/002_bar',
    ],
  },
})

We can verify

that the ordering satisfies any cross-sequence dependencies
that there are no migrations missing from the order argument.

One extra thing might need is a way to specify the 'installation' version of a particular sequence to elide 'old' migrations from the order argument. i.e. if you adopt a new tldraw plugin that already has a few migrations under its belt, you shouldn't need to apply those migrations to your existing data.

Changing MigrationSequences

I've thought about that as well. I think we can store the schema order as the 'version'. That should allow us to detect situations where migrations have been been set up with inconsistent ordering.

Jan 25 '24 16:01 ds300

I think we need to separate defining the migrations from ordering them.

Sure! I suspect that we might be able to simplify the design somewhat:

we define a bunch of migration and a "default"/"vendor" migration sequence
if people want to add their plugins on top, they can either copypaste, or concatenate the sequences themselves, as long as all constraints are satisfied and they pass the resulting sequence in

Or, in other words, this

const schema = createTLSchema({
  migrations: {
    sequences: [defaultMigrations, myMigrations],
    order: [
      'com.tldraw/001_foo',
      'com.tldraw/002_bar',
      'com.tldraw/003_baz',
      'com.clickup/001_foo',
      'com.clickup/002_bar',
    ],
  },
})

but with two rounds of simplification/inlining (passing in just the sequence itself and inlining the structures instead of "late binding" via strings):

const schema = createTLSchema({
  migrationSequence: defaultMigrations.concat(myMigrations)
})

I think we can also define the "head" migration we expect to see in createTLSchema to detect cases when people forget to include new migrations into their copypasted sequence.

One extra thing might need is a way to specify the 'installation' version of a particular sequence to elide 'old' migrations from the order argument

It makes total sense, but can we swap strings for values again?

import { someTLMigration } from '@tldraw/tldraw/migrations/...'
import { pluginMigration } from './migrations/...'

const minVersions = [someTLMigration, pluginMigration];

const schema = createTLSchema({
  migrations: ...,
  minVersions,
})

This way they are guaranteed to exist, there is no way to pass on an ID that doesn't correspond to anything.

I think we can store the schema order as the 'version'

You mean the entire id sequence as the "version"? Seems quite large, but I suppose if we only send it on connection establishment it can be alright?

Alternatively if the only purpose is to detect that the sequences diverge, we can calculate some sort of control sum/hash and pass that along, resetting the client state (and logging an error) if they don't match.

Jan 26 '24 11:01 si14

Hey, picking up this discussion now based on our meeting yesterday.

My main impression was that the system feels too heavy for an API that must have a public element, and that we may be much closer to a solution given the constraints that emerged toward the end of the conversation.

As I understand it:

Our current system has record migrations that operate on a single record type.
Our current system has store migrations that operate on multiple record types.
We also need a way to ensure that record migrations run after certain store migrations.
We need a way to ensure that certain store migrations run after certain record migrations.

Currently, all store migrations currently run first, followed but all record migrations.

If we wanted to add a second store migration, then we would end up with this:

However, the second store migration may depend on the other records in the store being already migrated. We need to run the store migration after the record migrations have run.

Because a store migration may effect a record, we also need a way to run record migrations after a specific store migration has run.

Significantly, because record migrations only operate on a single record, the order of types which are processed does not matter.

Based on the above, I think we can stick with our current system but mark dependencies between store migrations and record migrations.

The code would be something like this:

// iterate through store migrations
for (let i = 0; i < storeMigrations.length; i++) {
	// run the store migration
	const storeMigration = storeMigrations[i]
	storeMigration()

	// now iterate through record types
	for (const recordType of recordTypes) {
		// ...and get the migrations for that record type
		const recordTypeMigrations = migrationsForRecordType[recordType]

		// iterate through the migrations for that record type
		for (const recordTypeMigration of recordTypeMigrations) {
			// and if the record type is for this store migration, run it
			if (recordTypeMigration.storeMigrationVersion === i) {
				recordTypeMigration()
			}
		}
	}
}

All of our current store migrations would need to be coerced into version 0, and all future migrations would need to be marked as version 1 (though most would still work with the zero version). This would let us keep our current system and give us the appropriate control over ordering where it matters: between record migrations and store migrations.

Feb 13 '24 09:02 steveruizok

@steveruizok we can do that but we'd be rejecting the following point

SDK users are only able to add migrations for particular shape types at the moment. But we also allow them to stick any old data in meta objects and we're soon gonna allow them to create custom binding types, and we should also probably be allowing them to create custom asset types... there's currently no way for them to write migrations on non-shape types or migrations that need to operate on the entire store at once.

I'm happy to do that for now if you're OK with it, but I suspect we'd end up revisiting migrations again in the future if we do.

Feb 13 '24 10:02 ds300

We'd need to do something similar for the shape subtypes as well, there's a potential for the base shape migrations to conflict with the specific shape migrations. e.g. if we make an arrow migration that reads the shape.opacity property and then later on we decide to de-hoist opacity to shape.props.opacity again by creating a base shape migration, we'd need to make sure the base shape one happens after the arrow one, which is not how it's currently set up.

So every time you write a migration you'd need to look up what a) the current store version is, and b) the current shape version is, and then include those in the migration definition. To my mind this is a similar level of imposed wtf-ery as occasionally being asked to append things to an array of strings.

Feb 13 '24 13:02 ds300

@SomeHats raised a great point yesterday that with my proposal there'd be potential for conflicts with our shape type names. e.g. 'geo', 'frame', etc. If people have their own shape types defined with the same name as one of ours we'd be accidentally messing with their stuff.

Not sure how to resolve that. Renaming our shape types to have namespaces e.g. 'core.geo', 'core.frame', would probably be a huge breaking change.

Feb 13 '24 14:02 ds300

if we make an arrow migration that reads the shape.opacity property and then later on we decide to de-hoist opacity to shape.props.opacity again by creating a base shape migration

wouldn't the 2nd migration be able to add a dependsOn property referencing the first migration to ensure order?

Not sure how to resolve that. Renaming our shape types to have namespaces e.g. 'core.geo', 'core.frame', would probably be a huge breaking change.

can namespace be a new property on a shape? and if namespace isn't present it defaults to core or com.tldraw?

Feb 13 '24 15:02 mimecuvalo

wouldn't the 2nd migration be able to add a dependsOn property referencing the first migration to ensure order?

sure, as mentioned, but there's a couple of problems

if we make it optional, there's every likelyhood that people won't know they need to add it
even if we enforce it or they are just aware by magic, they have to look up the latest version somehow when authoring the migration. That's no simple thing, even for us let alone our users.

can namespace be a new property on a shape? and if namespace isn't present it defaults to core or com.tldraw?

that's an interesting option! It would prevent conflicts if we make the default value different to what we set our shapes' namespaces to. e.g. default null vs our shapes tldraw.

Feb 13 '24 16:02 ds300

if we make it optional, there's every likelyhood that people won't know they need to add it

we could make it not optional and require an array, at least an empty array. that way people could at least forces consumers to think about it.

Feb 13 '24 17:02 mimecuvalo

So every time you write a migration you'd need to look up what a) the current store version is, and b) the current shape version is, and then include those in the migration definition. To my mind this is a similar level of imposed wtf-ery as occasionally being asked to append things to an array of strings.

I actually mind this a lot less because it puts the responsibility on the migration author rather than a potential library consumer who just wants tldraw+a custom shape they installed from npm. We could provide some tooling around this to make it a bit easier?

Feb 13 '24 17:02 SomeHats

Name spacing is tricky, I'd prefer to avoid adding anything to the shape itself if we can. We currently throw on load if we detect two shape types with the same type, isn't that enough? Could we improve the error there to advise against having multiple shape types with the same type name? I'd be more concerned with US breaking projects by introducing a shape with a type name like "code" that conflicted with used-created shapes, so maybe we use type names like "tl@code" going forward?

fwiw, type name collisions has never come up as an actual problem in the community. This definitely more into advanced mode, so any customer advanced enough to be doing this type of thing would follow error instructions.

Feb 14 '24 09:02 steveruizok

SDK users are only able to add migrations for particular shape types at the moment. But we also allow them to stick any old data in meta objects and we're soon gonna allow them to create custom binding types, and we should also probably be allowing them to create custom asset types... there's currently no way for them to write migrations on non-shape types or migrations that need to operate on the entire store at once.

Could we add a way for users to add store-wide migrations?

Feb 14 '24 10:02 steveruizok

One way to avoid the namespacing issue would be to sort of continue doing what we're doing now: record type like shapes (or... plugins??) bring along their own migrations that are relevant to them.

If you don't have our arrow shape in your instance, you wouldn't have arrow migrations either, so there's not an issue if you create your own shape called arrow. If the arrow shape brings its own sequence of migrations (including store-level ones!) and we have a nice way of merging the sequences together, then we have a nice DX and don't have to worry so much about namespacing.

Feb 14 '24 10:02 SomeHats

I don't think that would work for store migrations. Would shapes be able to bring along store migrations?

Either way, this all feels tangential: name spacing and other collisions between shapes are problems that we have never encountered, and are ultimately out of our control anyway. (Developers could also copy our namespace).

Can we work from the basis that any user-created ("custom") record-based migrations or store migrations would effect only custom shapes or meta properties, and so would never be effected by our own store migrations? If so, then these should be completely independent of our own migrations; they could run before or after, it shouldn't matter, right?

Feb 18 '24 12:02 steveruizok

One question around custom shapes - right now we don't support any kind of customisation for our own shapes, but people do want this. Mostly, they want the functionality of our shapes, but they want to control the visuals themselves, or to add some extra functionality on top. For example, they might want to take our note shape, change the colours, font, and shape; and add an attribution/reaction feature.

If we could find a way of making these strictly additive changes - that would be awesome.

A previous prototype we looked at here

Before we implemented the unvalidated grab-bag meta property, we prototyped an idea for mini, name-spaced records that could be embedded within other records.

You'd define something like this:

const MyDataBagThingy = {
    id: 'my-company/note-shape/attribution',
    type: T.object({ createdBy: T.string }),
    target: 'shape.note',
    migrations: ...,
}

And then that would get added to the note shape under meta["my-company/note-shape/attribution"] with migrations and validations all handled correctly.

Feb 22 '24 10:02 SomeHats

tldraw
tldraw copied to clipboard

[RFC]: Better migrations

Better Migrations

What's wrong with the current migrations?

Migration application ordering danger

No way for SDK users to do store-level migrations

Not enough leverage to choose which order migrations are applied in

Conceptual mess

Proposal

Explicit intended sequence numbering by convention

Splitting migrations into files

`createTLSchema` can explicitly require a single `MigrationSequence` argument

Explicitly specify that `dependsOn` is purely for validation?

Custom migrations and upstream updates

Changing `MigrationSequence`s

tldraw tldraw copied to clipboard

[RFC]: Better migrations

Better Migrations

What's wrong with the current migrations?

Migration application ordering danger

No way for SDK users to do store-level migrations

Not enough leverage to choose which order migrations are applied in

Conceptual mess

Proposal

Explicit intended sequence numbering by convention

Splitting migrations into files

createTLSchema can explicitly require a single MigrationSequence argument

Explicitly specify that dependsOn is purely for validation?

Custom migrations and upstream updates

Changing MigrationSequences

tldraw
tldraw copied to clipboard

`createTLSchema` can explicitly require a single `MigrationSequence` argument

Explicitly specify that `dependsOn` is purely for validation?

Changing `MigrationSequence`s