cli icon indicating copy to clipboard operation
cli copied to clipboard

pscale database dump accidentally generates data files for views

Open orware opened this issue 1 year ago • 0 comments

I came across this issue while investigating another one over the past weekend and just wanted to simplify it and report it so it can be addressed down the road.

You can replicate the issue currently by generating a new database and then running the following queries on it to create a few sample tables and an associated view with some data:

CREATE TABLE IF NOT EXISTS `t1` (
  `id` bigint UNSIGNED NOT NULL AUTO_INCREMENT,
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `t1` (`id`) VALUES (1);

CREATE TABLE IF NOT EXISTS `t2` (
  `id` bigint UNSIGNED NOT NULL AUTO_INCREMENT,
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `t2` (`id`) VALUES (1);

CREATE VIEW `t1_t2`  AS SELECT `t1`.`id` AS `t1_id`, `t2`.`id` AS `t2_id` FROM (`t1` join `t2`);

After running the above queries, if you run a pscale database dump command against the dataset you'll end up with the following file listing:

<DATABASE_NAME>.t1.00001.sql
<DATABASE_NAME>.t1_t2.00001.sql
<DATABASE_NAME>.t1_t2-schema.sql
<DATABASE_NAME>.t1-schema.sql
<DATABASE_NAME>.t2.00001.sql
<DATABASE_NAME>.t2-schema.sql
metadata

In this list of files, the <DATABASE_NAME>.t1_t2-schema.sql file is ok since that allows the VIEW to be defined in the new database when you go to use restore-dump.

The problematic file(s) would begin with the <DATABASE_NAME>.t1_t2.00001.sql file.

Currently, the pscale database dump command appears to run a query against the VIEW and then begins to generate data files to include in the folder.

This can be particularly problematic the larger the underlying tables are that are associated with the view, as it would then need to generate many extra data files and make the whole process take longer, in addition to the final result being incorrect due to those extra files being present in the folder.

With those data files associated with the VIEW left in the folder, it would lead to an error like the following when the restore-dump process attempts to restore that "data":

target: <DATABASE_NAME>.-.primary: vttablet: rpc error: code = Unknown desc = Can not modify more than one base table through a join view '<DATABASE_NAME>.t1_t2' (errno 1393)

While putting together this issue and attempting a pscale database restore-dump test, I also encountered a different order of operations type issue, where the VIEW was trying to be created before the dependent tables were present in the database.

A fix for that issue was to add an extra prefix to the filename which seemed to do the trick for this simple situation, but may not work for all situations (e.g. I feel like it could fail for views that depend on other views without additional ordering/introspection being applied):

<DATABASE_NAME>.view.t1_t2-schema.sql

Without that minor adjustment to that filename I would get an error due to the process attempting to create the VIEW before the t2 table had been created due to how pscale database restore-dump seems to iterate through the files by default:

vttablet: rpc error: code = NotFound desc = Table '<DATABASE_NAME>.t2' doesn't exist (errno 1146)

orware avatar Jun 22 '23 23:06 orware