vitess icon indicating copy to clipboard operation
vitess copied to clipboard

Feature Request: Adding a second keyspace can break query routing

Open mattlord opened this issue 1 year ago • 4 comments

Feature Description

When starting with a single keyspace in Vitess, you can query all of the tables without ever using a DB or qualifying the table names.

As soon as you add a second keyspace, however, you will not be able to query the tables in the original keyspace w/o now adding them to the original Keyspace's vschema OR using the DB first or qualifying the table names.

Use Case(s)

Preventing unexpected user impacting query errors.

Test case

git checkout main && make build

Apply this diff:

diff --git a/examples/local/101_initial_cluster.sh b/examples/local/101_initial_cluster.sh
index 95b51f168c..39c232866b 100755
--- a/examples/local/101_initial_cluster.sh
+++ b/examples/local/101_initial_cluster.sh
@@ -79,7 +79,7 @@ wait_for_healthy_shard commerce 0 || exit 1
 vtctldclient ApplySchema --sql-file create_commerce_schema.sql commerce || fail "Failed to apply schema for the commerce keyspace"
 
 # create the vschema
-vtctldclient ApplyVSchema --vschema-file vschema_commerce_initial.json commerce || fail "Failed to apply vschema for the commerce keyspace"
+#vtctldclient ApplyVSchema --vschema-file vschema_commerce_initial.json commerce || fail "Failed to apply vschema for the commerce keyspace"
 
 # start vtgate
 CELL=zone1 ../common/scripts/vtgate-up.sh

cd examples/local

./101_initial_cluster.sh

mysql < ../common/insert_commerce_data.sql

mysql -e "select * from customer"

vtctldclient CreateKeyspace customer

mysql -e "select * from customer"

mysql commerce -e "select * from customer"

Results:

❯ mysql -e "select * from customer"
+-------------+--------------------+
| customer_id | email              |
+-------------+--------------------+
|           1 | [email protected]   |
|           2 | [email protected]     |
|           3 | [email protected] |
|           4 | [email protected]     |
|           5 | [email protected]     |
+-------------+--------------------+

❯ vtctldclient CreateKeyspace customer
Successfully created keyspace customer. Result:
{
  "name": "customer",
  "keyspace": {
    "keyspace_type": "NORMAL",
    "base_keyspace": "",
    "snapshot_time": null,
    "durability_policy": "none",
    "throttler_config": null,
    "sidecar_db_name": "_vt"
  }
}

❯ mysql -e "select * from customer"
ERROR 1105 (HY000) at line 1: table customer not found

❯ mysql commerce -e "select * from customer"
+-------------+--------------------+
| customer_id | email              |
+-------------+--------------------+
|           1 | [email protected]   |
|           2 | [email protected]     |
|           3 | [email protected] |
|           4 | [email protected]     |
|           5 | [email protected]     |
+-------------+--------------------+

mattlord avatar Oct 01 '24 01:10 mattlord

This happens to me when I want to move tables. If you use global routing, as soon as you load the additional vschema, it breaks with ambigious table errors, unless you set require_explicit_routing: true. That requires careful management to be able to continue to use global routing during and after a whole MoveTables operation.

I don't how far one would want to take this, but does it make sense to have additional info in the vschema, per table, to instruct the global router? If you mark it a table as primary_in_global_routing or something, while still respecting routing rules, every step of a MoveTables will work, even after complete.

The way I do it now, is right before complete, to remove the moved tables from the original vschema and set require_global_routing: false in the new one, but it's tricky.

wiebeytec avatar Oct 02 '24 11:10 wiebeytec

In the case of MoveTables, I wonder if this is due to a bug that has been fixed in recent versions? On main/v21, this test demonstrates that the global routing both for tables being moved and tables not being moved works before, during, and after the workflow (because both keyspaces have the tables explicitly defined in the respective keyspace's vschema and the table names are unique before and after the workflow with the interim state covered by the routing rules that the workflow puts in place):

git checkout main && make build

cd examples/local

alias vtctldclient='command vtctldclient --server=localhost:15999'

./101_initial_cluster.sh && mysql < ../common/insert_commerce_data.sql

vtctldclient GetVSchema commerce --compact

mysql -e "select * from product"
mysql -e "select * from customer"

./201_customer_tablets.sh && ./202_move_tables.sh

sleep 30

vtctldclient GetVSchema commerce --compact
vtctldclient GetVSchema customer --compact

mysql -e "select * from product"
mysql -e "select * from customer"

./203_switch_reads.sh && ./204_switch_writes.sh

vtctldclient GetVSchema commerce --compact
vtctldclient GetVSchema customer --compact

mysql -e "select * from product"
mysql -e "select * from customer"

vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer complete

vtctldclient GetVSchema commerce --compact
vtctldclient GetVSchema customer --compact

mysql -e "select * from product"
mysql -e "select * from customer"

mattlord avatar Oct 02 '24 12:10 mattlord

If I first apply the diff noted in this issue description, then the routing does fail for the table not being moved (and in the end not explicitly defined in either keyspace's vschema), product, both during the workflow and after it completes:

ERROR 1105 (HY000) at line 1: table product not found

For this specific case we could try and address the missing tables definition in the unsharded keyspace, when creating the workflow, by ensuring that the tables in the unsharded source keyspace are explicitly listed in that keyspace's vschema.

mattlord avatar Oct 02 '24 13:10 mattlord

My problem was that before a MoveTables into a sharded keyspace, I have to load the VSchema. Right then, global routing fails with errors about ambiguous table names, unless I set require_explicit_routing: true on the sharded keyspace. But as I said, it makes it very error prone, because I don't want to run with that setting on, eventually.

I also don't know what the best solution is. In practice, there's a difference between tables that are in the VSchema, and tables that are in the VSchema and actually exist as SQL tables. But the tables that only exist in the VSchema already participate in the global routing.

wiebeytec avatar Oct 02 '24 13:10 wiebeytec