dolt icon indicating copy to clipboard operation
dolt copied to clipboard

dolt_patch() attempts to re-create an index

Open rvkennedy opened this issue 4 months ago • 6 comments

I have two versions of a table schema, old and new, in branches of those names. In old, the description for a table is:


+------------------------+--------------------------------------------------------------+------+-----+---------+----------------+
| Field                  | Type                                                         | Null | Key | Default | Extra          |
+------------------------+--------------------------------------------------------------+------+-----+---------+----------------+
| field_name             | int unsigned                                                 | NO   | UNI | NULL    |                |
+------------------------+--------------------------------------------------------------+------+-----+---------+----------------+
INDEXES
+------------+------------+--------------------------+--------------+------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table      | Non_unique | Key_name                 | Seq_in_index | Column_name            | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+--------------------------+--------------+------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| table_name | 0          | field_name_UNIQUE        | 1            | field_name             | NULL      | 0           | NULL     | NULL   |      | BTREE      |         |               | YES     | NULL       |
+------------+------------+--------------------------+--------------+------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

And in the new version,

+------------------------+--------------------------------------------------------------+------+-----+---------+----------------+
| Field                  | Type                                                         | Null | Key | Default | Extra          |
+------------------------+--------------------------------------------------------------+------+-----+---------+----------------+
| field_name             | int unsigned                                                 | NO   | UNI | NULL    |                |
+------------------------+--------------------------------------------------------------+------+-----+---------+----------------+
INDEXES
+------------+------------+--------------------------+--------------+------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table      | Non_unique | Key_name                 | Seq_in_index | Column_name            | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+--------------------------+--------------+------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| table_name | 0          | field_name_UNIQUE        | 1            | field_name             | NULL      | 0           | NULL     | NULL   |      | BTREE      |         |               | YES     | NULL       |
+------------+------------+--------------------------+--------------+------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

So why, when I call: dolt sql -q "SELECT statement FROM dolt_patch('old', 'new', 'table_name') WHERE diff_type = 'schema';" --result-format=csv

Do I get this:

ALTER TABLE `table_name` DROP INDEX `field_name_UNIQUE`; ALTER TABLE `table_name` ADD INDEX `field_name_UNIQUE`(`field_name`);

Why is there any need to destroy and recreate the index?

What's more, if I do NOT specify the table name, and just call: dolt sql -q "SELECT statement FROM dolt_patch('old', 'new') WHERE diff_type = 'schema';" --result-format=csv The DROP disappears, and we are left only with the ADD INDEX, which tries to create an index which is already there.

rvkennedy avatar Aug 13 '25 16:08 rvkennedy

Great find. @elianddb will get on this today. I'm not certain why the schema diff is picking up an index change here.

timsehn avatar Aug 13 '25 16:08 timsehn

@rvkennedy I've tried repro this issue but so far have not had the same output, if you could give any extra info on specific steps you took to repro it would help a bit more. Here's what I've done so far:

#!/bin/bash

set -e

echo "=== Creating fresh repository ==="
cd /workspace
rm -rf customer-9677
mkdir customer-9677
cd customer-9677

echo "=== Initializing dolt repository ==="
dolt init

echo "=== Creating table with exact structure from customer issue ==="
# Create table that shows field_name as "UNI" by having explicit primary key
dolt sql -q "CREATE TABLE table_name (id INT PRIMARY KEY, field_name INT UNSIGNED NOT NULL, UNIQUE KEY field_name_UNIQUE (field_name));"
dolt add .
dolt commit -m "Initial table with unique index"

echo "=== Creating 'old' and 'new' branches as described by customer ==="
dolt checkout -b old
dolt checkout -b new

echo "=== Verifying both branches have identical schemas ==="
echo "OLD branch schema:"
dolt checkout old
dolt sql -q "DESCRIBE table_name;"
echo "OLD branch indexes:"
dolt sql -q "SHOW INDEX FROM table_name;"

echo "NEW branch schema:"
dolt checkout new  
dolt sql -q "DESCRIBE table_name;"
echo "NEW branch indexes:"
dolt sql -q "SHOW INDEX FROM table_name;"

echo "=== Running customer's exact commands ==="
echo "Command 1 (with table specified):"
echo 'dolt sql -q "SELECT statement FROM dolt_patch('\''old'\'', '\''new'\'', '\''table_name'\'')  WHERE diff_type = '\''schema'\'';"  --result-format=csv'
dolt sql -q "SELECT statement FROM dolt_patch('old', 'new', 'table_name')  WHERE diff_type = 'schema';"  --result-format=csv

echo ""
echo "Command 2 (without table specified):"  
echo 'dolt sql -q "SELECT statement FROM dolt_patch('\''old'\'', '\''new'\'')  WHERE diff_type = '\''schema'\'';"  --result-format=csv'
dolt sql -q "SELECT statement FROM dolt_patch('old', 'new')  WHERE diff_type = 'schema';"  --result-format=csv

echo ""
echo "=== Customer expects to see ==="
echo "Command 1 should return:"
echo "ALTER TABLE \`table_name\` DROP INDEX \`field_name_UNIQUE\`;"
echo "ALTER TABLE \`table_name\` ADD INDEX \`field_name_UNIQUE\`(\`field_name\`);"
echo ""
echo "Command 2 should return:"  
echo "ALTER TABLE \`table_name\` ADD INDEX \`field_name_UNIQUE\`(\`field_name\`);"
echo ""

echo "=== Reproduction complete ==="

elianddb avatar Aug 13 '25 19:08 elianddb

Does it help to specify this is with dolt version 1.58.3? I know there have been some patch-related fixes but this is the latest binary release. The test data is proprietary but I can send it in private if required.

rvkennedy avatar Aug 14 '25 10:08 rvkennedy

Here is a slightly modified version of your script which reproduces the issue. It seems to be to do with the foreign key constraints, but the drop of the unique key does NOT occur if the sign of field_change_sign doesn't change - presumably because it's determined that the schema is unchanged in that case.

set -e

echo "=== Creating fresh repository ==="
rm -rf customer-9677
mkdir customer-9677
cd customer-9677

echo "=== Initializing dolt repository ==="
dolt init

echo "=== Creating table with exact structure from customer issue ==="
# Create table that shows field_name as "UNI" by having explicit primary key

echo "=== Creating 'old' branches as described by customer ==="
dolt checkout -b old
dolt sql -q "CREATE TABLE other_table (  id int unsigned NOT NULL AUTO_INCREMENT,  PRIMARY KEY (id))"
dolt sql -q "CREATE TABLE table_name (id INT unsigned NOT NULL AUTO_INCREMENT, field_name INT UNSIGNED NOT NULL, field_change_sign bigint DEFAULT NULL, PRIMARY KEY (id),UNIQUE KEY field_name_UNIQUE (field_name), KEY fk_field_name_idx (field_name), CONSTRAINT fk_field_name FOREIGN KEY (field_name) REFERENCES other_table (id));"
dolt add .
dolt commit -m "Initial table with unique index"

echo "=== Creating 'new' branch as described by customer ==="
dolt checkout main
dolt checkout -b new
dolt sql -q "CREATE TABLE other_table (  id int unsigned NOT NULL AUTO_INCREMENT,  PRIMARY KEY (id))"
dolt sql -q "CREATE TABLE table_name (id INT unsigned NOT NULL AUTO_INCREMENT, field_name INT UNSIGNED NOT NULL, field_change_sign bigint unsigned DEFAULT NULL, PRIMARY KEY (id),UNIQUE KEY field_name_UNIQUE (field_name), KEY fk_field_name_idx (field_name), CONSTRAINT fk_field_name FOREIGN KEY (field_name) REFERENCES other_table (id));"
dolt add .
dolt commit -m "Initial table with unique index"

echo "=== Verifying both branches have identical schemas ==="
echo "OLD branch schema:"
dolt checkout old
dolt sql -q "DESCRIBE table_name;"
echo "OLD branch indexes:"
dolt sql -q "SHOW INDEX FROM table_name;"

echo "NEW branch schema:"
dolt checkout new  
dolt sql -q "DESCRIBE table_name;"
echo "NEW branch indexes:"
dolt sql -q "SHOW INDEX FROM table_name;"

echo "=== Running customer's exact commands ==="
echo "Command 1 (with table specified):"
echo 'dolt sql -q "SELECT statement FROM dolt_patch('\''old'\'', '\''new'\'', '\''table_name'\'')  WHERE diff_type = '\''schema'\'';"  --result-format=csv'
dolt sql -q "SELECT statement FROM dolt_patch('old', 'new', 'table_name')  WHERE diff_type = 'schema';"  --result-format=csv

echo ""
echo "Command 2 (without table specified):"  
echo 'dolt sql -q "SELECT statement FROM dolt_patch('\''old'\'', '\''new'\'')  WHERE diff_type = '\''schema'\'';"  --result-format=csv'
dolt sql -q "SELECT statement FROM dolt_patch('old', 'new')  WHERE diff_type = 'schema';"  --result-format=csv

echo ""
echo "=== Customer expects to see ==="
echo "Command 1 should return:"
echo "ALTER TABLE \`table_name\` DROP INDEX \`field_name_UNIQUE\`;"
echo "ALTER TABLE \`table_name\` ADD INDEX \`field_name_UNIQUE\`(\`field_name\`);"
echo ""
echo "Command 2 should return:"  
echo "ALTER TABLE \`table_name\` ADD INDEX \`field_name_UNIQUE\`(\`field_name\`);"
echo ""

echo "=== Reproduction complete ==="

The output I see from this is:

dolt sql -q "SELECT statement FROM dolt_patch('old', 'new', 'table_name')  WHERE diff_type = 'schema';"  --result-format=csv
statement
ALTER TABLE `table_name` DROP `field_change_sign`;
ALTER TABLE `table_name` ADD `field_change_sign` bigint unsigned DEFAULT NULL;
ALTER TABLE `table_name` DROP INDEX `fk_field_name_idx`;
ALTER TABLE `table_name` ADD INDEX `field_name_UNIQUE`(`field_name`);
ALTER TABLE `table_name` ADD INDEX `fk_field_name_idx`(`field_name`);

Command 2 (without table specified):
dolt sql -q "SELECT statement FROM dolt_patch('old', 'new')  WHERE diff_type = 'schema';"  --result-format=csv
statement
ALTER TABLE `table_name` DROP `field_change_sign`;
ALTER TABLE `table_name` ADD `field_change_sign` bigint unsigned DEFAULT NULL;
ALTER TABLE `table_name` DROP INDEX `fk_field_name_idx`;
ALTER TABLE `table_name` ADD INDEX `field_name_UNIQUE`(`field_name`);
ALTER TABLE `table_name` ADD INDEX `fk_field_name_idx`(`field_name`);

What I would hope to see, rather, is simply:

ALTER TABLE table_name ALTER COLUMN field_change_sign bigint unsigned DEFAULT NULL;

rvkennedy avatar Aug 14 '25 12:08 rvkennedy

Thanks for the additional repro steps. This should give something for @elianddb to chew on today.

timsehn avatar Aug 14 '25 15:08 timsehn

We have a fix in place atm, but still trying to decide to handle a couple of edge cases ...

elianddb avatar Aug 16 '25 00:08 elianddb