dolt icon indicating copy to clipboard operation
dolt copied to clipboard

`dolt revert` violates `NOT NULL` constraint

Open jmeickle opened this issue 2 years ago • 7 comments

I wanted to upgrade Dolt to try out the new push/pull sql-server features. As part of this upgrade, I also ran dolt migrate to update the data format on-disk. I was able to run the migration successfully against our development instance (which has less and different data, but the same schema). In production, the migration runs for a while, but then hits this panic referring to a single record of data:

panic: cannot write NULL to non-NULL field [recovered]
        panic: Tuple(2453, "[REDACTED]", 14690, "[REDACTED]")

goroutine 23612790 [running]:
github.com/dolthub/dolt/go/libraries/doltcore/migrate.translator.TranslateTuple.func2()
        /src/libraries/doltcore/migrate/tuples.go:120 +0x78
panic({0x1608b80, 0x1d8be80})
        /usr/local/go/src/runtime/panic.go:884 +0x20c
github.com/dolthub/dolt/go/store/val.(*TupleBuilder).Build(0x1dad3a8?, {0x1da2348?, 0x2d5a200?})
        /src/store/val/tuple_builder.go:58 +0x90
github.com/dolthub/dolt/go/libraries/doltcore/migrate.translator.TranslateTuple({0x40246b5300, 0x4024018090, {0x1db2900, 0x40003fc7e0}, {0x1da2348, 0x2d5a200}}, {0x1dacb20?, 0x400a201d40?}, {{{0x1dad3a8, 0x400046e000}, ...}})
        /src/libraries/doltcore/migrate/tuples.go:124 +0x2b4
github.com/dolthub/dolt/go/libraries/doltcore/migrate.translateTuples({0x1dacb20, 0x400a201d40}, {0x40246b5200, 0x40180eb770, {0x1db2900, 0x40003fc7e0}, {0x1da2348, 0x2d5a200}}, {0x40246b5300, 0x4024018090, ...}, ...)
        /src/libraries/doltcore/migrate/transform.go:561 +0x1dc
github.com/dolthub/dolt/go/libraries/doltcore/migrate.migrateIndex.func2()
        /src/libraries/doltcore/migrate/transform.go:520 +0x90
golang.org/x/sync/errgroup.(*Group).Go.func1()
        /go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:57 +0x60
created by golang.org/x/sync/errgroup.(*Group).Go
        /go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:54 +0x8c

I repeated this migration attempt. The same panic occurs on the same record, with these values:

id	created_date	document_type_id	clause_type	title	description
[REDACTED]	2022-09-14 21:45:13.38	[REDACTED]	[REDACTED] NULL NULL	

(There's nothing special about this record; all of them look like this, including nulls, but just with different string values.)

The schema in both development and production is:

CREATE TABLE `clause_type` (
  `id` char(32) NOT NULL,
  `created_date` datetime NOT NULL,
  `document_type_id` char(32) NOT NULL,
  `clause_type` varchar(256) NOT NULL,
  `title` varchar(128),
  `description` varchar(256),
  PRIMARY KEY (`id`),
  UNIQUE KEY `uq_clause_type_clause_type` (`clause_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;

I'm not sure if the NULL/non-NULL in the error message refers to SQL nulls, but title and description are the only two nulls for that record and I believe they've also always been nullable.

jmeickle avatar Oct 12 '22 13:10 jmeickle

Looking at the migration logs more closely, I tried a diff of the commits:

bash-4.2$ /usr/local/bin/dolt diff no6ahlgkrt08vdsk3gs7p1ss6q6om53b 32dnf48eo5e2v30jbc1ut27b5t8e4ra2
diff --dolt a/clause_type b/clause_type
--- a/clause_type @ h64qsgongd3hgititj3d3c9g9tll9sj9
+++ b/clause_type @ 1l08ar6hk1rkd6gcdggasj1sspuo631c
+---+----------------------------------+---------------------+----------------------------------+----------------------------------------------+-------+-------------+
|   | id                               | created_date        | document_type_id                 | clause_type                                  | title | description |
+---+----------------------------------+---------------------+----------------------------------+----------------------------------------------+-------+-------------+
| < | [REDACTED: ID] | 2022-08-01 00:00:00 | [REDACTED: ID] | [REDACTED: STR] | NULL  | NULL        |
| > | [REDACTED: ID] | NULL                | [REDACTED: ID] | [REDACTED: STR] | NULL  | NULL        |
+---+----------------------------------+---------------------+----------------------------------+----------------------------------------------+-------+-------------+

That's very interesting because we changed that field via Alembic migration quite some time ago:

def upgrade() -> None:
   [...]
    op.alter_column('clause_type', 'created_date',
               existing_type=mysql.DATETIME(),
               nullable=False)

So it seems it's actually not the nullable fields that are the problem, but a formerly nullable field that is now not. But it looks like there may be some issue with the table's current schema being applied to its historical records during the migration.

jmeickle avatar Oct 12 '22 13:10 jmeickle

Some more debugging info, starting from a dolt log of the failing commit (which turned out to be a revert commit):

/usr/local/bin/dolt log 32dnf48eo5e2v30jbc1ut27b5t8e4ra2

commit 32dnf48eo5e2v30jbc1ut27b5t8e4ra2 
Author: Dolt (clause-internal) <[email protected]>
Date:  Wed Sep 14 18:56:16 +0000 2022

        Revert "bad dates are bad"

commit no6ahlgkrt08vdsk3gs7p1ss6q6om53b 
Merge: uofdodaf4ep4m6pcl38msialdg14ojlg k85jmmil7tiun7oh8dhtklbt9lmoaec2
Author: Ben <[email protected]>
Date:  Wed Sep 14 18:38:11 +0000 2022

        forget which merge this was

commit k85jmmil7tiun7oh8dhtklbt9lmoaec2 
Author: Dolt (clause-internal) <[email protected]>
Date:  Wed Sep 07 04:08:43 +0000 2022

        Revert "this is bad, reverting"

commit kabejdimhpvnqi0fbkg39nuv1caa0t1j 
Author: Dolt (clause-internal) <[email protected]>
Date:  Wed Sep 07 04:05:32 +0000 2022

        this is bad, reverting

Diffs of the sequence of commits in the log (all as COMMIT~1 to COMMIT)

2022-10-12 12:25:29.804022662 +0000 UTC migrating commit kabejdimhpvnqi0fbkg39nuv1caa0t1j
bash-4.2$ /usr/local/bin/dolt diff --summary kabejdimhpvnqi0fbkg39nuv1caa0t1j~1 kabejdimhpvnqi0fbkg39nuv1caa0t1j
diff --dolt a/clause b/clause
--- a/clause @ dc2rh50lasi26beqjgji2vob6r6k4msh
+++ b/clause @ trslkvh8gio4qss5ckg032j3m7l1sq3d
prev size: 20301, new size: 20301, adds: 0, deletes: 0, modifications: 0
19,377 Rows Unmodified (95.45%)
0 Rows Added (0.00%)
0 Rows Deleted (0.00%)
924 Rows Modified (4.55%)
0 Cells Added (0.00%)
0 Cells Deleted (0.00%)
924 Cells Modified (0.30%)
(20,301 Row Entries vs 20,301 Row Entries)
2022-10-12 12:25:30.761369644 +0000 UTC migrating commit k85jmmil7tiun7oh8dhtklbt9lmoaec2

bash-4.2$ /usr/local/bin/dolt diff --summary k85jmmil7tiun7oh8dhtklbt9lmoaec2~1 k85jmmil7tiun7oh8dhtklbt9lmoaec2
diff --dolt a/clause b/clause
--- a/clause @ trslkvh8gio4qss5ckg032j3m7l1sq3d
+++ b/clause @ dc2rh50lasi26beqjgji2vob6r6k4msh
prev size: 20301, new size: 20301, adds: 0, deletes: 0, modifications: 0
19,377 Rows Unmodified (95.45%)
0 Rows Added (0.00%)
0 Rows Deleted (0.00%)
924 Rows Modified (4.55%)
0 Cells Added (0.00%)
0 Cells Deleted (0.00%)
924 Cells Modified (0.30%)
(20,301 Row Entries vs 20,301 Row Entries)
2022-10-12 12:25:31.701235787 +0000 UTC migrating commit no6ahlgkrt08vdsk3gs7p1ss6q6om53b

bash-4.2$ /usr/local/bin/dolt diff --summary no6ahlgkrt08vdsk3gs7p1ss6q6om53b~1 no6ahlgkrt08vdsk3gs7p1ss6q6om53b
diff --dolt a/alembic_version b/alembic_version
--- a/alembic_version @ qtghe362047en46mp1u3do58kmufvvkb
+++ b/alembic_version @ rbb8tabhu607qa4a4uve9cl1vkvqhd77
prev size: 1, new size: 1, adds: 0, deletes: 0, modifications: 0
0 Rows Unmodified (0.00%)
1 Row Added (100.00%)
1 Row Deleted (100.00%)
0 Rows Modified (0.00%)
1 Cell Added (100.00%)
1 Cell Deleted (100.00%)
0 Cells Modified (0.00%)
(1 Row Entry vs 1 Row Entry)

diff --dolt a/clause b/clause
--- a/clause @ ak71gn22267bk7rue7ubvsrhl3vnl2fn
+++ b/clause @ v4h9liaqugvfuki8lrjcc49omddac44r
prev size: 20490, new size: 20300, adds: 0, deletes: 96, modifications: 9904
1,322 Rows Unmodified (6.45%)
0 Rows Added (0.00%)
190 Rows Deleted (0.93%)
18,978 Rows Modified (92.62%)
0 Cells Added (0.00%)
2,850 Cells Deleted (0.93%)
20,160 Cells Modified (6.56%)
(20,490 Row Entries vs 20,300 Row Entries)

diff --dolt a/clause_type b/clause_type
--- a/clause_type @ gpesco08pe02hoba0if3gq0norrv91hk
+++ b/clause_type @ h64qsgongd3hgititj3d3c9g9tll9sj9
prev size: 1367, new size: 1367, adds: 0, deletes: 0, modifications: 0
No data changes. See schema changes by using -s or --schema.

bash-4.2$ /usr/local/bin/dolt diff -s no6ahlgkrt08vdsk3gs7p1ss6q6om53b~1 no6ahlgkrt08vdsk3gs7p1ss6q6om53b
diff --dolt a/alembic_version b/alembic_version
--- a/alembic_version @ qtghe362047en46mp1u3do58kmufvvkb
+++ b/alembic_version @ rbb8tabhu607qa4a4uve9cl1vkvqhd77
diff --dolt a/clause b/clause
--- a/clause @ ak71gn22267bk7rue7ubvsrhl3vnl2fn
+++ b/clause @ v4h9liaqugvfuki8lrjcc49omddac44r
 CREATE TABLE `clause` (
   `id` char(32) NOT NULL,
-  `created_date` datetime,
+  `created_date` datetime NOT NULL,
   `clause_type_id` char(32) NOT NULL,
   `parent_clause_id` char(32),
   `start_index` int,
   `end_index` int,
   `protoclause_id` char(32),
   `doc_hash` varchar(40),
   `document_run_result_id` char(32),
   `is_header` tinyint,
   `regularized_text` text,
-  `clause_status` enum('BASE','ISO','CHILD','BAD'),
+  `clause_status` enum('base','iso','child','bad'),
   `notes` text,
   `facts` json,
-  `standard` varchar(14),
+  `standard` enum('nvca_base','nvca_equiv','other_standard','nonstandard'),
   PRIMARY KEY (`id`),
   KEY `clause_type_id` (`clause_type_id`),
   UNIQUE KEY `id` (`id`),
   KEY `parent_clause_id` (`parent_clause_id`),
   CONSTRAINT `fk_clause_clause_type_id_clause_type` FOREIGN KEY (`clause_type_id`) REFERENCES `clause_type` (`id`),
   CONSTRAINT `fk_clause_parent_clause_id_clause` FOREIGN KEY (`parent_clause_id`) REFERENCES `clause` (`id`),
   CONSTRAINT `ck_clause_standardenum` CHECK ((`standard` IN ('nvca_base', 'nvca_equiv', 'other_standard', 'nonstandard')))
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
diff --dolt a/clause_type b/clause_type
--- a/clause_type @ gpesco08pe02hoba0if3gq0norrv91hk
+++ b/clause_type @ h64qsgongd3hgititj3d3c9g9tll9sj9
 CREATE TABLE `clause_type` (
   `id` char(32) NOT NULL,
-  `created_date` datetime,
+  `created_date` datetime NOT NULL,
   `document_type_id` char(32) NOT NULL,
   `clause_type` varchar(256) NOT NULL,
   `title` varchar(128),
   `description` varchar(256),
   PRIMARY KEY (`id`),
   UNIQUE KEY `id` (`id`),
   UNIQUE KEY `uq_clause_type_clause_type` (`clause_type`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
2022-10-12 12:25:36.658669065 +0000 UTC migrating commit 32dnf48eo5e2v30jbc1ut27b5t8e4ra2
panic: cannot write NULL to non-NULL field [recovered]
       panic: Tuple(2453, "[REDACTED: ID]", 14690, "[REDACTED: STRING]")
       
bash-4.2$ /usr/local/bin/dolt diff 32dnf48eo5e2v30jbc1ut27b5t8e4ra2~1 32dnf48eo5e2v30jbc1ut27b5t8e4ra2
diff --dolt a/clause_type b/clause_type
--- a/clause_type @ h64qsgongd3hgititj3d3c9g9tll9sj9
+++ b/clause_type @ 1l08ar6hk1rkd6gcdggasj1sspuo631c
+---+----------------------------------+---------------------+----------------------------------+----------------------------------------------+-------+-------------+
|   | id                               | created_date        | document_type_id                 | clause_type                                  | title | description |
+---+----------------------------------+---------------------+----------------------------------+----------------------------------------------+-------+-------------+
| < | [REDACTED: ID] | 2022-08-01 00:00:00 | [REDACTED: ID] | [REDACTED: STRING] | NULL  | NULL        |
| > | [REDACTED: ID] | NULL                | [REDACTED: ID] |  [REDACTED: STRING] | NULL  | NULL        |
+---+----------------------------------+---------------------+----------------------------------+----------------------------------------------+-------+-------------+

So basically, we applied a SQL-level schema migration, and then we applied a revert commit that should not have been valid (setting a now non-nullable field to null). This hasn't prevented using Dolt, but this sequence seems to break the data format migration.

It's a bit hard to tell if this is operator error in how we handled applying migrations and reverting, or if there's actually a data consistency bug here

jmeickle avatar Oct 12 '22 14:10 jmeickle

Thanks for sharing all the great details and clues here. @andrew-wm-arthur is probably the right person to take a look at this today.

fulghum avatar Oct 12 '22 15:10 fulghum

@Eronarn Thank you for the excellent detail here. I was able to quickly repro locally. The revert operation is indeed the source of the bug, and was able to complete despite violating the not-null constraint. We'll get a patch out for this bug.

The remaining question is what would you like to do with the existing row (and possible other rows) that violates the constraint. One option is to use dolt filter-branch to remove them from the history. Once they are deleted, the migration should proceed without issue.

andy-wm-arthur avatar Oct 12 '22 17:10 andy-wm-arthur

Great, glad it was helpful! In this case we don't care about losing the commit because we've updated it since then anyways. I'll try to come up with a filter-branch query to fix it and see if I can get the migration runnable.

jmeickle avatar Oct 12 '22 17:10 jmeickle

dolt filter-branch "delete from clause_type where clause_type is NULL" will remove the conflicting data from the history. However, this will delete every instance of these rows in the history, even in cases where there is not a NOT NULL constraint.

andy-wm-arthur avatar Oct 12 '22 19:10 andy-wm-arthur

dolt filter-branch "delete from clause_type where clause_type is NULL" will remove the conflicting data from the history. However, this will delete every instance of these rows in the history, even in cases where there is not a NOT NULL constraint.

A bug was encountered when attempting to use this filter-branch operation to cleanup the history. For NOT NULL fields with fixed-width types, Dolt makes aggressive assumptions when deserializing tuple values (for perf reasons). As mentioned above, the database gets into an invalid state (NULLS in NOT NULL fields) stemming from missing validation on dolt revert. Because of these erroneous NULLs, the fixed-width fast path makes a bad data access and the WHERE clause of the filter-branch query (clause_type is NULL) is incorrectly evaluated.

andy-wm-arthur avatar Oct 19 '22 16:10 andy-wm-arthur

I think we need a simple repro for this. Summarizing what I understand:

  1. Table has no null constraint.
  2. Make a commit that deletes a row that has a null value for column.
  3. Add a null constraint.
  4. Run dolt revert of commit in 2)

Row reverted now violates the not null constraint

zachmu avatar Feb 11 '23 01:02 zachmu

I think @andy-wm-arthur 's not null merge work will fix this but I'm going to make a repro anyway.

timsehn avatar May 10 '23 22:05 timsehn

This fails with a good error message now at least:

$ dolt init --fun
Successfully initialized dolt data repository.
$ dolt sql -q "create table t (pk int primary key, c1 int)"
$ dolt add t
$ dolt commit -am "added table"
commit 602ioc2trpnb8l22n6328i5i8hf6m2a8 (HEAD -> main) 
Author: timsehn <[email protected]>
Date:  Wed May 10 15:11:14 -0700 2023

        added table

$ dolt sql -q "insert into t values (0, NULL)"
Query OK, 1 row affected (0.00 sec)
$ dolt sql -q "select * from t"
+----+------+
| pk | c1   |
+----+------+
| 0  | NULL |
+----+------+

$ dolt commit -am "added 0, NULL"
commit 5hr3cgo0ml8ijq6opiclldr71kjtnugr (HEAD -> main) 
Author: timsehn <[email protected]>
Date:  Wed May 10 15:12:30 -0700 2023

        added 0, NULL

$ dolt sql -q "delete from t where pk=0"      
Query OK, 1 row affected (0.00 sec)
$ dolt commit -am "removed 0, NULL" 
commit a5a31im0gfd14bp0hrbk9p901u1in2jj (HEAD -> main) 
Author: timsehn <[email protected]>
Date:  Wed May 10 15:13:16 -0700 2023

        removed 0, NULL

$ dolt sql -q "alter table t modify column c1 int not null"
$ dolt diff
diff --dolt a/t b/t
--- a/t @ 7hvk88fdbbl2fcnmk6m59kllophtsgtr
+++ b/t @ le3bddjm86m9an37cf4utn5723phf2ev
 CREATE TABLE `t` (
   `pk` int NOT NULL,
-  `c1` int,
+  `c1` int NOT NULL,
   PRIMARY KEY (`pk`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
$ dolt commit -am "Added not null constraint"
commit gmo7e1u94s7v7ief6f61ujet3j9gf27i (HEAD -> main) 
Author: timsehn <[email protected]>
Date:  Wed May 10 15:14:11 -0700 2023

        Added not null constraint

$ dolt revert a5a31im0gfd14bp0hrbk9p901u1in2jj
table t can't be automatically merged.
To merge this table, make the schema on the source and target branch equal.

I'm going to resolve because we fail the revert. I think that's always the appropriate behavior here.

timsehn avatar May 10 '23 22:05 timsehn