ideas icon indicating copy to clipboard operation
ideas copied to clipboard

SQL migration tool that morphs database according to DDL

Open StreetStrider opened this issue 5 years ago • 10 comments

Project description

Consider we have application with relational DB. It is often paired with migration tool which works according to the set of migrations. Every migration is a patch of DB structure or data changes that must be applied to migrate DB to the new state. The problem with such migrations is that their order is matter. It becomes a large issue when multiple developers introduce migrations in parallel feature-branches. Migrations are often enumerated or contain timestamps which makes it more difficult to apply them.

The idea: Implement SQL morph tool that can change DB state corresponding to SQL file. Instead of using patch-based migrations use state-based SQL file with full schema definition. Any time developer need to introduce some changes, he just modifies that SQL file. If many developers do changes, that SQL file shall be merged like any text or source code file. Then SQL morph tool morpes real DB according to recent full state in SQL file, creating required ALTER, CREATE and DROP commands if it possible, or stop if SQL file is not compatible with current DB state.

Such state-based approach simplify work with relational DB for multiple developers and makes migrating drastically simpler.

  • Can that SQL file be decomposed?: SQL file must contain full DDL for SQL morph tool to work, but it can be decomposed via some include mechanism. It would be possible to decompose by subsystems (store DDLs related to some models beside them).
  • Can ORM be used?: It is possible, but not required. If ORM is in use, it should generate full SQL which then will be passed to SQL morph tool. No patch-based migrations, only full schema definition just not in plain SQL, but in ORM DDL instructions.
  • What about data migrations?: It is a space for a good research. Some simple cases still may be done automatically. In more complex cases some additional SQL DML must be provided. However, it might be possible to implement it in some declarative way, just like main DDL file.

Relevant Technology

The implementation language doesn't matter. The only requirement for language is to have drivers for popular DBs.

Complexity and required time

Complexity

  • [ ] Beginner - This project requires no or little prior knowledge of the technolog(y|ies) specified to contribute to the project
  • [ ] Intermediate - The user should have some prior knowledge of the technolog(y|ies) to the point where they know how to use it, but not necessarily all the nooks and crannies of the technology
  • [x] Advanced - The project requires the user to have a good understanding of all components of the project to contribute

Required time (ETA)

  • [ ] Little work - A couple of days
  • [ ] Medium work - A week or two
  • [x] Much work - The project will take more than a couple of weeks and serious planning is required

StreetStrider avatar Dec 02 '18 17:12 StreetStrider

https://sqlalchemy-migrate.readthedocs.io/en/latest/ may be helpful

KOLANICH avatar Dec 02 '18 18:12 KOLANICH

@KOLANICH according to docs, it's based on changesets. My idea is about full-state declarative approach to migrations instead of changesets.

StreetStrider avatar Dec 02 '18 18:12 StreetStrider

Not sure that it is useful. If one changes a DB structure, one usually do it because app logic requires it.

From my experience: I had a DB id, date, fkey, where id is primary key and fkey is a foreign key. The first change was to convert date from string in iso format into integer counting number of days from 2011-01-01, because it turned out that DB engine (sqlite) has no built in date type and stores and processes them as strings with big overhead. Then to optimize search performance I have eliminated date and fkey and generated id as date << 23 | fkey. Of course it has broken the relations, but the search has became much faster.

Can your structure automatically derive this and generate the code doing the stuff?

KOLANICH avatar Dec 02 '18 18:12 KOLANICH

The idea for tool is to have diffing algorithm that can detect schema changes, like table/attributes addition and removal, and attribute altering where possible. How many this tool can make is a question of algorithm smartness, possible some heuristics included.

If you add one column with default it can add it as well, if you remove column it can remove it too. Much like human will act manually or writing something like addColumn('foo', 'string') in his ORM. Anything in relational model can be done pretty easy, things like optimizations and hacks harder, but possible.

StreetStrider avatar Dec 02 '18 18:12 StreetStrider

The problem is how to deal with renamings. All the columns need to have a persistent unique id. But for a text file it is inconvenient. Derivadion of these ids is unreliable.

For example in one file one column is called a, in another one it is called b. You wanna merge 2 files by 2 different devs, so how your system is going to distinguish between column renaming a -> b and addition of b followed by deletion of a?

KOLANICH avatar Dec 02 '18 18:12 KOLANICH

That is one of details to investigate.

StreetStrider avatar Dec 02 '18 18:12 StreetStrider

https://github.com/TEAM-OSTRICH/SYS-DIFFER

KOLANICH avatar Dec 14 '18 20:12 KOLANICH

@KOLANICH will look onto that. :+1:

StreetStrider avatar Dec 14 '18 20:12 StreetStrider

Nice, but not good as can be. This tool definitely should be command line first and also db-agnostic as much as possible.

StreetStrider avatar Dec 18 '18 12:12 StreetStrider

I've found two projects that met this conditions partially: https://github.com/eulerto/pgquarrel https://github.com/djrobstep/migra

However, they seem to work only with PgSQL and only with real schemas (which means having input ddl is not enough, you need to create real db from it and compare it with target db).

StreetStrider avatar Jan 06 '21 14:01 StreetStrider