dolt icon indicating copy to clipboard operation
dolt copied to clipboard

Three Way Merging can result in an incorrect column ordering

Open nicktobey opened this issue 1 year ago • 1 comments

Steps to reproduce:

git checkout nicktobey/schemamergetests
go test libraries/doltcore/merge/schema_merge_test.go -run "^\QTestSchemaMerge\E$/^\Qcolumn_add\E$/^\Qdrop_tests\E$/^\Qmerge_right_to_left\E$/^\Qleft_side_column_add_with_additional_column_after\E$/^\Qleft_side_adds_column_and_assigns_non-null_value,_extra_column_has_data_change_on_right\E$"

Output:

--- FAIL: TestSchemaMerge (0.00s)
    --- FAIL: TestSchemaMerge/column_add/drop_tests (0.00s)
        --- FAIL: TestSchemaMerge/column_add/drop_tests/merge_right_to_left (0.00s)
            --- FAIL: TestSchemaMerge/column_add/drop_tests/merge_right_to_left/left_side_column_add_with_additional_column_after (0.00s)
                --- FAIL: TestSchemaMerge/column_add/drop_tests/merge_right_to_left/left_side_column_add_with_additional_column_after/left_side_adds_column_and_assigns_non-null_value,_extra_column_has_data_change_on_right (0.00s)
                    schema_merge_test.go:934: 
                                Error Trace:    /Users/nick/Documents/dolt/go/libraries/doltcore/merge/schema_merge_test.go:934
                                                                        /Users/nick/Documents/dolt/go/libraries/doltcore/merge/schema_merge_test.go:956
                                Error:          Not equal: 
                                                expected: hash.Hash{0x49, 0xba, 0x43, 0xbc, 0x38, 0x51, 0x96, 0x39, 0x9a, 0x1b, 0xa6, 0x80, 0x5b, 0x95, 0xc0, 0xcb, 0x77, 0x44, 0x82, 0x1f}
                                                actual  : hash.Hash{0x20, 0x74, 0xe, 0xd4, 0x6f, 0xfe, 0xb6, 0x77, 0x6a, 0x17, 0xa7, 0x3f, 0xbf, 0x15, 0x28, 0xf3, 0x6e, 0xe, 0xb, 0xd6}
                                                
                                                Diff:
                                                --- Expected
                                                +++ Actual
                                                @@ -1,4 +1,4 @@
                                                 (hash.Hash) (len=20) {
                                                - 00000000  49 ba 43 bc 38 51 96 39  9a 1b a6 80 5b 95 c0 cb  |I.C.8Q.9....[...|
                                                - 00000010  77 44 82 1f                                       |wD..|
                                                + 00000000  20 74 0e d4 6f fe b6 77  6a 17 a7 3f bf 15 28 f3  | t..o..wj..?..(.|
                                                + 00000010  6e 0e 0b d6                                       |n...|
                                                 }
                                Test:           TestSchemaMerge/column_add/drop_tests/merge_right_to_left/left_side_column_add_with_additional_column_after/left_side_adds_column_and_assigns_non-null_value,_extra_column_has_data_change_on_right
                    schema_merge_test.go:937: expected rows: 
                            { key: 01000000 value: 02000000, 04000000 }
                    schema_merge_test.go:938: actual rows: 
                            { key: 01000000 value: 04000000, 02000000 }

What I suspect is going on:

IIUC, when we perform a three-way merge while there's a schema change, we generate a new tuple description for the merged table by taking the columns from the left side, removing any that were dropped by the schema change, and appending to the end any that were adding by the right side. This means that the resulting tuple description does not necessarily match the merged schema in regards to the ordering of the columns.

However, I suspect that some part of the logic assumes that the order of columns in the merged tuple matches the merged schema, which can cause the values to get swapped around in the output.

I don't know whether this order is what's ultimately written to disk or whether it gets fixed up later in the run.

nicktobey avatar Sep 28 '23 23:09 nicktobey

I can confirm that our schema merge code hasn't ever respected ordering of columns – new columns are added at the end of the merged schema. There are some TODOs in merge_schema.go around the column merging logic. Nothing goes back and fixes the ordering up. The same order is used to write the table to disk.

fulghum avatar Sep 29 '23 00:09 fulghum