vitess
vitess copied to clipboard
Multi-tenant import support in Vitess
Feature Description
This issue describes proposed support in Vitess for imports from a multi-tenant database cluster into a single Vitess cluster.
Motivation
The user has a multi-tenant database architecture. They have several separate MySQL databases, with common schemas, one per tenant. Each database will be imported using one MoveTables workflow, using an external keyspace pointing to the
source, with a common target keyspace. All tables, on source and target, will have a tenant_id
column,uniquely
identifying the tenant.
We can run multiple such workflows in parallel, until all tenants are imported. Just like normal imports, we would also like reverse workflows to be running: to allow for rollback.
Routing rules should, ideally, be set up, so that queries targeting a specific tenant id would be routed correctly to the target, if the tenant has already been imported, or to the source if the import workflow is in progress but has not yet been switched to the target.
Many setups with multi-tenancy will use a multi-schema approach, where each tenant has their own MySQL database on a single server. In this case the queries will use a database qualifier, which matches their database name. So when they are running the migrations into Vitess, they will continue to use the same database qualifier.
Assumptions
- The target keyspace has its schema already initialized.
- Each table in the target keyspace has a tenant id column with the same name and type on both source and target, and it should be part of the primary key of each table.
- If the target keyspace is sharded, the vindex should be a multi-column vindex which includes the tenant id column.
Design
New Keyspace VSchema Attribute
A new keyspace level attribute in the vschema for the keyspace will indicate that this keyspace supports multiple tenants and also the column which identifies the tenant and the column's data type.
Per-TenantId MoveTables
Two new options will be added to MoveTables
-
--tenant-id
, example:--tenant-id 452892
-
--source-keyspace-alias
, example:--source-keyspace-alias external_tenant_452892
The tenant id corresponds to the tenant of the source keyspace from which data is being migrated. The column name/type are already specified in the keyspace vschema.
The source keyspace alias can be used interchangeably with the keyspace name of the unmanaged source keyspace that points to the cluster from which we are importing.
Tenant-based filtering
An additional filter will be added to the binlogsource
filter
query for all the tables both for the forward and reverse VReplication workflows.
Tenant-based routing rules
We add a new concept: Keyspace Routing Rules. This will be a new topo object and part of the SrvVSchema. It will be a
list of FromKeyspace
to a ToKeyspace
.
These will have a higher scope than the current (table-based) Routing Rules. If Keyspace Routing Rules are present, the table routing rules will not be checked.
Two new vtctldclient commands ApplyKeyspaceRoutingRules
and GetKeyspaceRoutingRules
.
Initial assumption is that all traffic be switched at the same time. So replica reads and primary reads/writes will need to be switched together.
Keyspace Routing Rules Example
This is an example of how the keyspace routing rules will look for a tenant migration for tenant id 12345
. User
sends qualified queries using the qualifier user_12345
. For each tenant we create an unmanaged keyspace called
source_12345
. The target keyspace is target
. Note that we never reroute the target, so the target is never the source
in any of these routing rules. This is because the target has multiple tenants and we cannot reroute it to any single source.
The MoveTables
command will look like:
MoveTables --target-keyspace target Create
--all-tables
--tenant-id 12345
--workflow import_12345
--source-keyspace source_12345
--source-keyspace-alias user_12345
The associated keyspace routing rules before cutover:
{
"user_12345": "source_12345",
"source_12345": "source_12345"
}
The associated keyspace routing rules after cutover:
{
"user_12345": "target",
"source_12345": "target"
}
I have a few questions
- Why do we need
source-keyspace
andsource-keyspace-alias
? - Are we still holding the assumption that traffic switch will happen at same time for read and write?
Fixed via issues referenced above