cli
cli copied to clipboard
pscale database dump accidentally generates data files for views
I came across this issue while investigating another one over the past weekend and just wanted to simplify it and report it so it can be addressed down the road.
You can replicate the issue currently by generating a new database and then running the following queries on it to create a few sample tables and an associated view with some data:
CREATE TABLE IF NOT EXISTS `t1` (
`id` bigint UNSIGNED NOT NULL AUTO_INCREMENT,
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `t1` (`id`) VALUES (1);
CREATE TABLE IF NOT EXISTS `t2` (
`id` bigint UNSIGNED NOT NULL AUTO_INCREMENT,
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `t2` (`id`) VALUES (1);
CREATE VIEW `t1_t2` AS SELECT `t1`.`id` AS `t1_id`, `t2`.`id` AS `t2_id` FROM (`t1` join `t2`);
After running the above queries, if you run a pscale database dump
command against the dataset you'll end up with the following file listing:
<DATABASE_NAME>.t1.00001.sql
<DATABASE_NAME>.t1_t2.00001.sql
<DATABASE_NAME>.t1_t2-schema.sql
<DATABASE_NAME>.t1-schema.sql
<DATABASE_NAME>.t2.00001.sql
<DATABASE_NAME>.t2-schema.sql
metadata
In this list of files, the <DATABASE_NAME>.t1_t2-schema.sql
file is ok since that allows the VIEW
to be defined in the new database when you go to use restore-dump
.
The problematic file(s) would begin with the <DATABASE_NAME>.t1_t2.00001.sql
file.
Currently, the pscale database dump
command appears to run a query against the VIEW
and then begins to generate data files to include in the folder.
This can be particularly problematic the larger the underlying tables are that are associated with the view, as it would then need to generate many extra data files and make the whole process take longer, in addition to the final result being incorrect due to those extra files being present in the folder.
With those data files associated with the VIEW
left in the folder, it would lead to an error like the following when the restore-dump
process attempts to restore that "data":
target: <DATABASE_NAME>.-.primary: vttablet: rpc error: code = Unknown desc = Can not modify more than one base table through a join view '<DATABASE_NAME>.t1_t2' (errno 1393)
While putting together this issue and attempting a pscale database restore-dump
test, I also encountered a different order of operations type issue, where the VIEW
was trying to be created before the dependent tables were present in the database.
A fix for that issue was to add an extra prefix to the filename which seemed to do the trick for this simple situation, but may not work for all situations (e.g. I feel like it could fail for views that depend on other views without additional ordering/introspection being applied):
<DATABASE_NAME>.view.t1_t2-schema.sql
Without that minor adjustment to that filename I would get an error due to the process attempting to create the VIEW
before the t2
table had been created due to how pscale database restore-dump
seems to iterate through the files by default:
vttablet: rpc error: code = NotFound desc = Table '<DATABASE_NAME>.t2' doesn't exist (errno 1146)