MDEV-32362 Remove generated columns from INSERT statements in dump tool
Description
Currently, generated column names and values are present in INSERT statements created by the dump tool. This is inconsistent with the handling of generated columns in the MySQL dump tool, where values for generated columns are calculated based on other column values and should not be explicitly inserted.
Remove names and values for generated columns from INSERT statements created by mariadb-dump. For tables with one or more generated columns, include a list of non-generated column names in the INSERT statement. Commit includes bug fix and tests for generated columns during dump.
Release Notes
N/A
How can this PR be tested?
Execute the main suite in mysql-test-run. This commit adds a test in that suite.
For manual testing, follow the steps below:
-
Start the MariaDB server
./build/sql/mariadbd --datadir=./data --user=mariadb & -
Connect to the MariaDB command-line client
./build/client/mysql -u root -S /tmp/mysql.sock -
Create a database t1 and switch to it
CREATE DATABASE t1; USE t1; -
Create a table inside this database with both real and generated columns
CREATE TABLE example( firstname VARCHAR(255), lastname VARCHAR(255), fullname VARCHAR(512) AS (CONCAT(firstname, ' ', lastname)) VIRTUAL ); -
Insert the following values into the table
INSERT INTO example(firstname, lastname) values('Donald', 'Duck'); -
In a new terminal, execute
mariadb-dump. This will copy the dump contents intot1_output.sql./build/client/mariadb-dump t1 example > t1_output.sql -
Copy the contents of
t1_output.sqlto see if generated columns are included in theINSERTstatementcat t1_output.sql
Before this change
The following section from the actual output of cat t1_output.sql included 'Donald Duck' in the INSERT statement generated by mariadb-dump as seen below:
LOCK TABLES `example` WRITE;
/*!40000 ALTER TABLE `example` DISABLE KEYS */;
set autocommit=0;
INSERT INTO `example` VALUES
('Donald','Duck','Donald Duck');
/*!40000 ALTER TABLE `example` ENABLE KEYS */;
UNLOCK TABLES;
commit;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;
After this change
The following section from the actual output of cat t1_output.sql correctly removes 'Donald Duck' from the generated INSERT statement since it was not actually inserted into the virtual fullname column of our example table. Furthermore, a list of non-generated columns is included in the INSERT statement as seen below:
LOCK TABLES `example` WRITE;
/*!40000 ALTER TABLE `example` DISABLE KEYS */;
set autocommit=0;
INSERT INTO `example` (`firstname`, `lastname`) VALUES ('Donald','Duck');
/*!40000 ALTER TABLE `example` ENABLE KEYS */;
UNLOCK TABLES;
commit;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;
Basing the PR against the correct MariaDB version
- [x] This is a new feature and the PR is based against the latest MariaDB development branch.
Copyright
All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
I've just — few days ago — added field_flags array to mariadb-dump. In my commit it only stores whether the first is VECTOR or not, but the idea was that it can be used for other flags too.
I suggest you reuse that. That is, instead of field_flags->str[i] being 'V' for vectors and ' ' for anything else, it should be a set of flags, like
#define FIELD_IS_VECTOR 1
#define FIELD_IS_VIRTUAL 2
...
if (field_flags->str[i] & FIELD_IS_VECTOR)
{
...
and so on
I've just — few days ago — added
field_flagsarray tomariadb-dump. In my commit it only stores whether the first is VECTOR or not, but the idea was that it can be used for other flags too.I suggest you reuse that. That is, instead of
field_flags->str[i]being'V'for vectors and' 'for anything else, it should be a set of flags, like#define FIELD_IS_VECTOR 1 #define FIELD_IS_VIRTUAL 2 ... if (field_flags->str[i] & FIELD_IS_VECTOR) { ...and so on
Thank you for the feedback, I've refactored my code to reuse field_flags to detect generated columns. Please let me know if this is the change you had in mind.
Not quite, I meant this:
--- a/client/mysqldump.cc
+++ b/client/mysqldump.cc
@@ -104,7 +104,7 @@
#define DUMP_TABLE_SEQUENCE 1
/* until MDEV-35831 is implemented, we'll have to detect VECTOR by name */
-#define MYSQL_TYPE_VECTOR "V"
+#define MYSQL_TYPE_VECTOR 1
static my_bool ignore_table_data(const uchar *hash_key, size_t len);
static void add_load_option(DYNAMIC_STRING *str, const char *option,
@@ -3498,11 +3498,10 @@ static uint get_table_structure(const char *table, const char *db, char *table_t
if (opt_header)
dynstr_append_checked(&select_field_names_for_header,
quote_for_equal(row[0], name_buff));
+ dynstr_append_mem_checked(&field_flags, "", 1);
/* VECTOR doesn't have a type code yet, must be detected by name */
if (row[3] && strcmp(row[3], "vector") == 0)
- dynstr_append_checked(&field_flags, MYSQL_TYPE_VECTOR);
- else
- dynstr_append_checked(&field_flags, " ");
+ field_flags.str[field_flags.length-1]|= MYSQL_TYPE_VECTOR;
}
if (vers_hidden)
@@ -3623,11 +3622,10 @@ static uint get_table_structure(const char *table, const char *db, char *table_t
while ((row= mysql_fetch_row(result)))
{
ulong *lengths= mysql_fetch_lengths(result);
+ dynstr_append_mem_checked(&field_flags, "", 1);
/* VECTOR doesn't have a type code yet, must be detected by name */
if (strncmp(row[SHOW_TYPE], STRING_WITH_LEN("vector(")) == 0)
- dynstr_append_checked(&field_flags, MYSQL_TYPE_VECTOR);
- else
- dynstr_append_checked(&field_flags, " ");
+ field_flags.str[field_flags.length-1]|= MYSQL_TYPE_VECTOR;
if (init)
{
if (!opt_xml && !opt_no_create_info)
@@ -4519,7 +4517,7 @@ static void dump_table(const char *table, const char *db, const uchar *hash_key,
*/
is_blob= field->type == MYSQL_TYPE_GEOMETRY ||
field->type == MYSQL_TYPE_BIT ||
- field_flags.str[i] == MYSQL_TYPE_VECTOR[0] ||
+ field_flags.str[i] & MYSQL_TYPE_VECTOR ||
(opt_hex_blob && field->charsetnr == 63 &&
(field->type == MYSQL_TYPE_STRING ||
field->type == MYSQL_TYPE_VAR_STRING ||
This is the complete patch that passes tests. field_flags is a bitmask. You'll use it as
if (field_flags.str[i] & FIELD_IS_GENERATED)
...
The current main branch has these failures:
This PR currently has these failures:
Of these 5 failures 4 have the label "Required". This does not make sense as the test jobs that have "Required" to pass are nonetheless failing on branch 'main', so they are actually not required to pass. We are now diving into these test failures and trying to debug them without knowing if they actually have anything to do with the changes in this PR.
Some test failures are nondeterministic, possibly depending on the load of the CI worker machines, and a test would typically pass if you hit "rebuild" on a failed build on the Buildbot grid view. Sometimes, depending on your luck, this might require multiple attempts. Generally, the mandatory builds must pass (at least once) before something can be pushed to a protected branch. In my experience, most of the sporadically failing tests are currently in ./mtr --suite=rpl,binlog,binlog_encryption. @knielsen and @bnestere have started an effort to fix replication test failures quite some time ago, so I feel that the situation is improving slowly but steadily.
In https://buildbot.mariadb.org/#/grid?branch=refs%2Fpull%2F3838%2Fmerge I can see failures of the tests versioning.data and versioning.old_timestamp across the board. Those failures do not seem to be present in https://buildbot.mariadb.org/#/grid?branch=main but some sporadic failures are there.
@dr-m Thanks for the reply. I am not saying it is impossible for people to contribute while tests are failing. Sure they can do the detective work to arrive on the conclusion that all failures are sporadic and/or on branches that are not 'required'. The problem is that this is causing a lot of confusion for new contributors. People with experience of CI systems elsewhere assume and expect that a CI system is green and is used to prevent regressions from entering the code base. The fact that is is constantly red leads to many contributors wasting time in investigating unrelated things, which may be off-putting and discourage them from doing more contributions.
The exciting digression about buildbot notwithstanding, @FarihaIS, please, say when you'll want a review
@FarihaIS, is this PR ready for a review or are you still working on it?
@vuvova I'm still working on it! The fix I added earlier broke tests in the versioning suite, which I'm trying to rectify
And now?
Pull requests that are not ready for review can be marked as a "draft". The buildbot grid view looks rather OK for this, except for two tests that are failing on multiple platforms:
versioning.data
versioning.old_timestamp
Hello @FarihaIS, it's been a while since this PR was updated. Just wanted to check if it is ready for the review or do you intend to complete it?
Hi @svoj, yes I'm still working on the pull request, but progress has been a bit slower with the tests that broke in the versioning suite. I'll prioritize this and take a look again soon, thank you!
@FarihaIS that's fine, I'm just checking stale pull requests if they're still relevant.