server icon indicating copy to clipboard operation
server copied to clipboard

MDEV-32362 Remove generated columns from INSERT statements in dump tool

Open FarihaIS opened this issue 10 months ago • 15 comments

Description

Currently, generated column names and values are present in INSERT statements created by the dump tool. This is inconsistent with the handling of generated columns in the MySQL dump tool, where values for generated columns are calculated based on other column values and should not be explicitly inserted.

Remove names and values for generated columns from INSERT statements created by mariadb-dump. For tables with one or more generated columns, include a list of non-generated column names in the INSERT statement. Commit includes bug fix and tests for generated columns during dump.

Release Notes

N/A

How can this PR be tested?

Execute the main suite in mysql-test-run. This commit adds a test in that suite.

For manual testing, follow the steps below:

  1. Start the MariaDB server

    ./build/sql/mariadbd --datadir=./data --user=mariadb &
    
  2. Connect to the MariaDB command-line client

    ./build/client/mysql -u root -S /tmp/mysql.sock
    
  3. Create a database t1 and switch to it

    CREATE DATABASE t1;
    USE t1;
    
  4. Create a table inside this database with both real and generated columns

    CREATE TABLE example(
            firstname VARCHAR(255),
            lastname VARCHAR(255),
            fullname VARCHAR(512) AS (CONCAT(firstname, ' ', lastname)) VIRTUAL
    );
    
  5. Insert the following values into the table

    INSERT INTO example(firstname, lastname) values('Donald', 'Duck');
    
  6. In a new terminal, execute mariadb-dump . This will copy the dump contents into t1_output.sql

    ./build/client/mariadb-dump t1 example > t1_output.sql
    
  7. Copy the contents of t1_output.sql to see if generated columns are included in the INSERT statement

    cat t1_output.sql
    

Before this change

The following section from the actual output of cat t1_output.sql included 'Donald Duck' in the INSERT statement generated by mariadb-dump as seen below:

LOCK TABLES `example` WRITE;
/*!40000 ALTER TABLE `example` DISABLE KEYS */;
set autocommit=0;
INSERT INTO `example` VALUES
('Donald','Duck','Donald Duck');
/*!40000 ALTER TABLE `example` ENABLE KEYS */;
UNLOCK TABLES;
commit;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

After this change

The following section from the actual output of cat t1_output.sql correctly removes 'Donald Duck' from the generated INSERT statement since it was not actually inserted into the virtual fullname column of our example table. Furthermore, a list of non-generated columns is included in the INSERT statement as seen below:

LOCK TABLES `example` WRITE;
/*!40000 ALTER TABLE `example` DISABLE KEYS */;
set autocommit=0;
INSERT INTO `example` (`firstname`, `lastname`) VALUES ('Donald','Duck');
/*!40000 ALTER TABLE `example` ENABLE KEYS */;
UNLOCK TABLES;
commit;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

Basing the PR against the correct MariaDB version

  • [x] This is a new feature and the PR is based against the latest MariaDB development branch.

Copyright

All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.

FarihaIS avatar Feb 13 '25 22:02 FarihaIS

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Feb 13 '25 22:02 CLAassistant

I've just — few days ago — added field_flags array to mariadb-dump. In my commit it only stores whether the first is VECTOR or not, but the idea was that it can be used for other flags too.

I suggest you reuse that. That is, instead of field_flags->str[i] being 'V' for vectors and ' ' for anything else, it should be a set of flags, like

#define FIELD_IS_VECTOR  1
#define FIELD_IS_VIRTUAL 2
...
  if (field_flags->str[i] & FIELD_IS_VECTOR)
  {
    ...

and so on

vuvova avatar Feb 14 '25 13:02 vuvova

I've just — few days ago — added field_flags array to mariadb-dump. In my commit it only stores whether the first is VECTOR or not, but the idea was that it can be used for other flags too.

I suggest you reuse that. That is, instead of field_flags->str[i] being 'V' for vectors and ' ' for anything else, it should be a set of flags, like

#define FIELD_IS_VECTOR  1
#define FIELD_IS_VIRTUAL 2
...
  if (field_flags->str[i] & FIELD_IS_VECTOR)
  {
    ...

and so on

Thank you for the feedback, I've refactored my code to reuse field_flags to detect generated columns. Please let me know if this is the change you had in mind.

FarihaIS avatar Feb 14 '25 21:02 FarihaIS

Not quite, I meant this:

--- a/client/mysqldump.cc
+++ b/client/mysqldump.cc
@@ -104,7 +104,7 @@
 #define DUMP_TABLE_SEQUENCE 1
 
 /* until MDEV-35831 is implemented, we'll have to detect VECTOR by name */
-#define MYSQL_TYPE_VECTOR "V"
+#define MYSQL_TYPE_VECTOR 1
 
 static my_bool ignore_table_data(const uchar *hash_key, size_t len);
 static void add_load_option(DYNAMIC_STRING *str, const char *option,
@@ -3498,11 +3498,10 @@ static uint get_table_structure(const char *table, const char *db, char *table_t
       if (opt_header)
         dynstr_append_checked(&select_field_names_for_header,
                               quote_for_equal(row[0], name_buff));
+      dynstr_append_mem_checked(&field_flags, "", 1);
       /* VECTOR doesn't have a type code yet, must be detected by name */
       if (row[3] && strcmp(row[3], "vector") == 0)
-        dynstr_append_checked(&field_flags, MYSQL_TYPE_VECTOR);
-      else
-        dynstr_append_checked(&field_flags, " ");
+        field_flags.str[field_flags.length-1]|= MYSQL_TYPE_VECTOR;
     }
 
     if (vers_hidden)
@@ -3623,11 +3622,10 @@ static uint get_table_structure(const char *table, const char *db, char *table_t
     while ((row= mysql_fetch_row(result)))
     {
       ulong *lengths= mysql_fetch_lengths(result);
+      dynstr_append_mem_checked(&field_flags, "", 1);
       /* VECTOR doesn't have a type code yet, must be detected by name */
       if (strncmp(row[SHOW_TYPE], STRING_WITH_LEN("vector(")) == 0)
-        dynstr_append_checked(&field_flags, MYSQL_TYPE_VECTOR);
-      else
-        dynstr_append_checked(&field_flags, " ");
+        field_flags.str[field_flags.length-1]|= MYSQL_TYPE_VECTOR;
       if (init)
       {
         if (!opt_xml && !opt_no_create_info)
@@ -4519,7 +4517,7 @@ static void dump_table(const char *table, const char *db, const uchar *hash_key,
         */
         is_blob= field->type == MYSQL_TYPE_GEOMETRY ||
                  field->type == MYSQL_TYPE_BIT ||
-                 field_flags.str[i] == MYSQL_TYPE_VECTOR[0] ||
+                 field_flags.str[i] & MYSQL_TYPE_VECTOR ||
                  (opt_hex_blob && field->charsetnr == 63 &&
                    (field->type == MYSQL_TYPE_STRING ||
                     field->type == MYSQL_TYPE_VAR_STRING ||

This is the complete patch that passes tests. field_flags is a bitmask. You'll use it as

  if (field_flags.str[i] & FIELD_IS_GENERATED)
    ...

vuvova avatar Feb 17 '25 13:02 vuvova

The current main branch has these failures: image

This PR currently has these failures: image

Of these 5 failures 4 have the label "Required". This does not make sense as the test jobs that have "Required" to pass are nonetheless failing on branch 'main', so they are actually not required to pass. We are now diving into these test failures and trying to debug them without knowing if they actually have anything to do with the changes in this PR.

ottok avatar Feb 18 '25 22:02 ottok

Some test failures are nondeterministic, possibly depending on the load of the CI worker machines, and a test would typically pass if you hit "rebuild" on a failed build on the Buildbot grid view. Sometimes, depending on your luck, this might require multiple attempts. Generally, the mandatory builds must pass (at least once) before something can be pushed to a protected branch. In my experience, most of the sporadically failing tests are currently in ./mtr --suite=rpl,binlog,binlog_encryption. @knielsen and @bnestere have started an effort to fix replication test failures quite some time ago, so I feel that the situation is improving slowly but steadily.

In https://buildbot.mariadb.org/#/grid?branch=refs%2Fpull%2F3838%2Fmerge I can see failures of the tests versioning.data and versioning.old_timestamp across the board. Those failures do not seem to be present in https://buildbot.mariadb.org/#/grid?branch=main but some sporadic failures are there.

dr-m avatar Feb 19 '25 15:02 dr-m

@dr-m Thanks for the reply. I am not saying it is impossible for people to contribute while tests are failing. Sure they can do the detective work to arrive on the conclusion that all failures are sporadic and/or on branches that are not 'required'. The problem is that this is causing a lot of confusion for new contributors. People with experience of CI systems elsewhere assume and expect that a CI system is green and is used to prevent regressions from entering the code base. The fact that is is constantly red leads to many contributors wasting time in investigating unrelated things, which may be off-putting and discourage them from doing more contributions.

ottok avatar Mar 03 '25 19:03 ottok

The exciting digression about buildbot notwithstanding, @FarihaIS, please, say when you'll want a review

vuvova avatar Mar 07 '25 13:03 vuvova

@FarihaIS, is this PR ready for a review or are you still working on it?

vuvova avatar Apr 22 '25 11:04 vuvova

@vuvova I'm still working on it! The fix I added earlier broke tests in the versioning suite, which I'm trying to rectify

FarihaIS avatar Apr 22 '25 16:04 FarihaIS

And now?

vuvova avatar May 26 '25 08:05 vuvova

Pull requests that are not ready for review can be marked as a "draft". The buildbot grid view looks rather OK for this, except for two tests that are failing on multiple platforms:

versioning.data
versioning.old_timestamp

dr-m avatar Jul 03 '25 07:07 dr-m

Hello @FarihaIS, it's been a while since this PR was updated. Just wanted to check if it is ready for the review or do you intend to complete it?

svoj avatar Oct 24 '25 07:10 svoj

Hi @svoj, yes I'm still working on the pull request, but progress has been a bit slower with the tests that broke in the versioning suite. I'll prioritize this and take a look again soon, thank you!

FarihaIS avatar Oct 25 '25 07:10 FarihaIS

@FarihaIS that's fine, I'm just checking stale pull requests if they're still relevant.

svoj avatar Oct 25 '25 07:10 svoj