vitess
vitess copied to clipboard
Idea: Online DDL syntax to postpone execution of migrations, and per-shard execution
We hit a use case where we wanted to run a migration on a multisharded keyspace, but first only wanted to run the migration on a single shard, to validate the results, by way of reducing the blast radius. Right now there is no support for this in Vitess other than manually injecting entries into _vt.schema_migrations
. At the time, we had this idea: https://github.com/vitessio/vitess/pull/7825, but I think it's a wrong approach, because it can lead to discrepancies between shards.
I'd like to suggest this new idea: consider the following command:
set @@ddl_strategy='online --postpone-execution`;
alter table t add column i int;
We can introduce a --postpone-execution
flag. With this flag, the migration does not get auto-executed. It is submitted, queued, but never started. Contrast this with the existing --postpone-completion
flag, where the migration starts but runs forever or until told to complete.
Then, we add the following syntax and variations:
ALTER VITESS_MIGRATION '<uuid>' EXECUTE;
ALTER VITESS_MIGRATION '<uuid>' EXECUTE SHARD '<shard-name>';
ALTER VITESS_MIGRATION EXECUTE ALL;
This is symmetrical to ALTER VITESS_MIGRATION COMPLETE|CANCEL
but with the addition of a per-shard instruction (which we can later also add to COMPLETE|CANCEL
).
With this new syntax, we will be able to submit a migration, which is queued on all shards, making the queue consistent, but we can then choose to only start it on a single, specific shard. When satisfied, we can then proceed to execute it on all shards.
We take upon ourselves the risk of having an inconsistent schema between the different shards -- this is really most useful to adding/modifying indexing.
This will be a pretty straightforward development; it will work well with all existing scheduling options; it will be possible to mix --postpone-execution
with --postpone-completion
for example (though it does sound a little bit over the top to do this mixture).
Thoughts welcome.
This sounds great, and much safer than #7825. We'll still have put in BOLD ALL-CAPS that people are assuming the risk of inconsistent schema. @vitessio/query-serving what sort of spurious bug reports might we get from this? And are there things we can do in code to head those off?
And are there things we can do in code to head those off?
I'm just thinking that ALTER VITESS_MIGRATION '<uuid>' EXECUTE SHARD '<shard-name>'
could intentionally generate a Warning to let the user know about safety or lack thereof.
Please see https://github.com/vitessio/vitess/pull/10915 for an implementation (new term is "postpone launch" because "execute" is such an overloaded term).