dolt icon indicating copy to clipboard operation
dolt copied to clipboard

two columns with the same name have different tags

Open druvv opened this issue 1 year ago • 2 comments

Today, Dolt has an internal tag system that is used to assign identities to columns. The tag system helps Dolt diff and merge columns even when they have been renamed or modified.

For example, let's say I have a users table with a name as the primary key and a phone number stored as an integer. If I rename the phone number column, the internal tag of that column remains the same.

$ dolt sql -q "CREATE table users (name VARCHAR(100) PRIMARY KEY, phone BIGINT);"
$ dolt sql -q "INSERT INTO users VALUES ('druvv', 4224224242);"
$ dolt sql -q "SELECT * from users;"
+-------+------------+
| name  | phone      |
+-------+------------+
| druvv | 4224224242 |
+-------+------------+
$ dolt schema tags
+-------+--------+-------+
| table | column | tag   |
+-------+--------+-------+
| users | name   | 9815  |
| users | phone  | 15960 |
+-------+--------+-------+
$ dolt sql -q "ALTER TABLE users RENAME column phone to phone_number"
$ dolt schema tags
+-------+--------------+-------+
| table | column       | tag   |
+-------+--------------+-------+
| users | name         | 9815  |
| users | phone_number | 15960 |
+-------+--------------+-------+

When I merge the users table with the renamed column against another users table with the original columns names, the columns containing the phone numbers are matched by their tags. Their row values are merged.

$ dolt commit -am "initial data"
commit rqnmsovvbk8nokh2nphbaf4d2pslfasg (HEAD -> main)
Author: druvv <[email protected]>
Date:  Thu Jul 28 16:41:22 -0700 2022

        initial data
$ dolt checkout -b other
Switched to branch 'other'
$ dolt sql -q "INSERT INTO users VALUES ('vader', 9999999999)";
$ dolt commit -am "add vader"
commit dmfftn1quviend96n8ml1qpi7u77b8ul (HEAD -> other)
Author: druvv <[email protected]>
Date:  Thu Jul 28 16:42:23 -0700 2022

        add vader
$ dolt checkout main
$ dolt sql -q "ALTER TABLE users RENAME column phone to phone_number"
commit rinom96u0ilhutuh84gs812lqcmh88ak (HEAD -> main)
Author: druvv <[email protected]>
Date:  Thu Jul 28 16:45:16 -0700 2022

        rename column phone to phone_number
        
$ dolt merge other
Updating rinom96u0ilhutuh84gs812lqcmh88ak..dmfftn1quviend96n8ml1qpi7u77b8ul
users | 1 +
1 tables changed, 1 rows added(+), 0 rows modified(*), 0 rows deleted(-)
$ dolt sql -q "SELECT * from users;"
+-------+--------------+
| name  | phone_number |
+-------+--------------+
| druvv | 4224224242   |
| vader | 9999999999   |
+-------+--------------+

Depending on the sequence of schema alterations that occur on separate branches. The tags of a column that should be mergeable diverge and the merge aborts. For example, let's continue from above:

$ dolt sql -q "ALTER TABLE users ADD COLUMN side VARCHAR(10) DEFAULT 'dark side'"
$ dolt sql -q "ALTER TABLE users MODIFY COLUMN phone_number CHAR(10);"
$ dolt commit -am "add side column and change phone number to char"
commit psdr3929odno8gl1sohbca6hst25rmsi (HEAD -> main)
Merge: rinom96u0ilhutuh84gs812lqcmh88ak dmfftn1quviend96n8ml1qpi7u77b8ul
Author: druvv <[email protected]>
Date:  Thu Jul 28 16:54:57 -0700 2022

        add side column and change phone number to char

$ dolt checkout -b other2
$ dolt sql -q "INSERT INTO users VALUES ('anakin', 1111111111);"
$ dolt sql -q "ALTER TABLE users MODIFY COLUMN phone_number CHAR(10);"
$ dolt commit -am "add anakin and modify column to char"
commit pu9vvg15elualdqve5rgltmlgdikrjes (HEAD -> other2)
Author: druvv <[email protected]>
Date:  Thu Jul 28 16:55:42 -0700 2022

        add anakin and modify column to char
$ dolt checkout main
$ dolt merge other2
Updating psdr3929odno8gl1sohbca6hst25rmsi..pu9vvg15elualdqve5rgltmlgdikrjes
Merge aborted due to error
cause: schema conflict found, merge aborted. Please alter schema to prevent schema conflicts before merging.
       schema conflicts for table users:
        two columns with the same name 'phone_number' have different tags

The statement that led to this error is: dolt sql -q "ALTER TABLE users ADD COLUMN side VARCHAR(10) DEFAULT 'dark side'"

The reason why the phone_number column's tag differs between the main branch other2 is due to the side column being added prior to the phone_number column's modification. Tags are generated in a psuedo-random manner and they are seeded with the underlying representation of the existing columns. Because the existing columns changed on one branch but not the other, the generated tags differed.

In the near future, we plan to remove the tag system and improve this unexpected behavior. We encourage you to leave feedback below or a reaction if you run into this issue.

druvv avatar Jul 29 '22 00:07 druvv

A simpler reproduction of this issue is the following:

dolt init
dolt sql -q "CREATE table t (pk int PRIMARY KEY);"
dolt commit -Am "create table with key"

dolt checkout -b "right"
dolt sql -q "alter table t add column col1 int;"
dolt sql -q "alter table t add column col2 int;"
dolt commit -Am "add columns in order"

dolt checkout main
dolt sql -q "alter table t add column col2 int;"
dolt sql -q "alter table t add column col1 int;"
dolt commit -Am "add columns in reverse order"

> dolt merge right
Updating b7mkl40n5706vev52gf8v8u6cgisppbr..3t83jotilrp15nc7rovdraff5lhimb03
Unable to stage changes: add and commit to finish merge
error: Failed to commit changes.
cause: schema conflict found, merge aborted. Please alter schema to prevent schema conflicts before merging.
       schema conflicts for table t:
       	two columns with the same name 'col2' have different tags. See https://github.com/dolthub/dolt/issues/3963
       	two columns with the same name 'col1' have different tags. See https://github.com/dolthub/dolt/issues/3963

druvv avatar Sep 19 '22 16:09 druvv

We added the ability to override the tag system when necessary. This will allow you to unblock yourself if a merge error like this occurs. The command is called dolt schema update-tag. Here is an example on how it can be used: https://github.com/dolthub/dolt/blob/1956ca903ba0f342a19eaf2dc766353586b07c9e/integration-tests/bats/column_tags.bats#L334

druvv avatar Nov 19 '22 00:11 druvv

I have a simpler ask for this: make tags a storage only detail that never impact the logic of diff and merge. Diff and merge always match by column name.

zachmu avatar Feb 11 '23 01:02 zachmu

@fulghum is going to fix the specific case of two similarly names and schema'ed tables added on different branches that end up with different tags should merge.

timsehn avatar May 10 '23 22:05 timsehn