dolt
dolt copied to clipboard
Optimize SQL returned from `dolt_patch()`
Currently dolt_patch() is a very literal interpretation of the diff. For every row added theer is an insert statement. This optimizes for readability/parse-ability but can be slow to actually apply.
We should add an option to compress the SQL do that it is fast to actually apply. One simple thing to do would be to group many insert statements into one single statement with many values. I'm sure there are other optimizations beyond that to explore.
Looking at how dolt_patch() represents ALTER statements that modify columns would be interesting, too. Today, Dolt doesn't detect that it's the same logical column before and after the ALTER is run, so dolt_patch will show ALTER TABLE t MODIFY COLUMN c1 TEXT; as two separate drop/add column statements:
ALTER TABLE t DROP COLUMN c1;ALTER TABLE t ADD COLUMN c1 TEXT;
Technically, this is a correctness issue since applying those statements won't end up with the same result as the original statement. This may be difficult to detect today with our current concept of column tags.
https://github.com/dolthub/dolt/pull/7771 fixed the issue where a column type change causes the generated patch statements to show separate DROP COLUMN and ADD COLUMN statements, and also added support for ALTER TABLE t MODIFY COLUMN statements to be generated.