bun Add something like GORM's AutoMigrate

First, we should provide some unified way to inspect current database schema via using INFORMATION_SCHEMA.TABLES or something similar. Then we can sync the database by comparing the actual database schema with available Go models.

Feb 22 '22 12:02 vmihailenco

+1

Aug 02 '22 10:08 ly95

@vmihailenco I've been reading through the issues and noticed this one had help wanted on it. I only started to work with bun recently, but this seems like something I could try to do. Would you like some help with it?

Sep 14 '22 07:09 bevzzz

@bevzzz The help is welcome here, but you should be warned that this is not a small issue and it will probably require several weeks of work.

If you decide to work on this, my advice would be to learn how GORM AutoMigrate works and what we can learn from it. Writing your thoughts here so others can help would be the next step.

I did not have a chance to learn from GORM AutoMigrate yet, but I guess the plan should look like this:

Improve Bun dialects with ability to inspect database schema, for example, pgdialect should be capable to return a list of tables with all columns. I would start with a single PostgreSQL dialect for now.
Decide which operations we want to support, for example, we can start with 2 operations:

package pgdialect

func (Dialect) RenameColumn(...)
func (Dialect) ChangeColumnType(...)

Use the database schema (step 1) to detect changes and apply commands from step 2 to auto-migrate the schema. Don't get too ambitious and only start with a single operation, for example, renaming a column.

Each step can be done separately so we have a chance to send feedback at the earlier stage.

Also pinging @dizzyfool and @LdDl who done some excellent work on https://github.com/dizzyfool/genna and https://github.com/LdDl/bungen and might be interested to help designing the API for step 1.

Sep 14 '22 11:09 vmihailenco

@vmihailenco Thanks for getting back in touch so quickly! Yes, you're right, bringing this feature will take some time and research, so I really appreciate the steps that you've suggested.
I will familiarize myself with the way it's done in GORM and get back with a proposed solution for the 2 operations you've mentioned.

P.S. I plan to start working on it on the next weekend, I guess we are not in a rush here, right?

Sep 15 '22 20:09 bevzzz

P.S. I plan to start working on it on the next weekend, I guess we are not in a rush here, right?

No rush for sure. Even making some progress on step 1) will make a change in the long run.

Sep 16 '22 06:09 vmihailenco

TLDR: just bunch of SQL queries to extract reference information about Postgres tables and my first-look thoughts

My POV:

Reasons to start with PostgreSQL
1. SQLite will be tough since it has not sugar-ish syntax to ALTER columns. Hard copy should be done, something like this:
```
create a_copy(...some new columns...old columns)
insert into a_copy from a (everything, but skip new columns)
drop a
rename a_copy -> a
```
1. Not sure about MySQL, I do not remember if it allows to alter columns. It could be done in same way as with SQLite I guess (with almost same pseudocode, but for columns);
2. MSSQL - I haven't touch it for a thousand years;
3. PostgreSQL - I guess there are not obvious difficulties, aren't there?

Straightforward SQL queries to extract PostgreSQL schemas/tables information:

Basic query to extract all tables with theirs parent scherma

select table_schema, table_name from information_schema.tables where table_type = 'BASE TABLE'
-- Ignore table_type == 'VIEW'
-- Add (table_schema, table_name) or table_name filters if needed

Query to extract relations between tables. Each relation is represented by constraint (foreigh key basically), source schema.table, set of source columns (which do reference), target schema.table, set of target columns (which are referenced). Sets (both source and target) should be properly aggregated - I believe array_agg function could do the job without additional order by call inside of this function.

with
    schemas as (
        select nspname, oid
        from pg_namespace
    ),
    tables as (
        select oid, relnamespace, relname, relkind
        from pg_class
    ),
    columns as (
        select attrelid, attname, attnum
        from pg_attribute a
        where a.attisdropped = false
    )
select distinct 
        co.conname            as constraint_name,
        ss.nspname            as schema_name,
        s.relname             as table_name,
        array_agg(sc.attname) as columns,
        ts.nspname            as target_schema,
        t.relname             as target_table,
        array_agg(tc.attname) as target_columns
from pg_constraint co
left join tables s on co.conrelid = s.oid
left join schemas ss on s.relnamespace = ss.oid
left join columns sc on s.oid = sc.attrelid and sc.attnum = any (co.conkey)
left join tables t on co.confrelid = t.oid
left join schemas ts on t.relnamespace = ts.oid
left join columns tc on t.oid = tc.attrelid and tc.attnum = any (co.confkey)
where co.contype = 'f'
    and co.conrelid in (select oid from pg_class c where c.relkind = 'r')
    and array_position(co.conkey, sc.attnum) = array_position(co.confkey, tc.attnum)
group by constraint_name, schema_name, table_name, target_schema, target_table;
-- Add (ss.nspname, s.relname) filter if needed

And most not obvious part (in terms of SQL query): columns.

Each column is represented by: schema location (either 'public' or not), table_name, column name, column type (in string representation), position of column in table (note: indexing starts with 1), is column part of pk?, is column fk?, is column could be null?, default value for column (in string representation) if it has been provided, size of varchar (it is null for columns of non-text type), is column of enum type? Note: I'm not that postgres enum type guy, so I'm not sure if enum_values is correct way to get needed information;

with
    enums as (
        select distinct true                   as is_enum,
                        sch.nspname            as table_schema,
                        tb.relname             as table_name,
                        col.attname            as column_name,
                            array_agg(e.enumlabel) as enum_values
        from pg_class tb
        left join pg_namespace sch on sch.oid = tb.relnamespace
        left join pg_attribute col on col.attrelid = tb.oid
        inner join pg_enum e on e.enumtypid = col.atttypid
		group by 1, 2, 3, 4
    ),
    arrays as (
        select sch.nspname  as table_schema,
               tb.relname   as table_name,
               col.attname  as column_name,
               col.attndims as array_dims
        from pg_class tb
        left join pg_namespace sch on sch.oid = tb.relnamespace
        left join pg_attribute col on col.attrelid = tb.oid
        where col.attndims > 0
    ),
    info as (
		select distinct
		 	kcu.table_schema as table_schema,
			kcu.table_name   as table_name,
			kcu.column_name  as column_name,
			array_agg((
				select constraint_type::text 
				from information_schema.table_constraints tc 
				where tc.constraint_name = kcu.constraint_name 
					and tc.constraint_schema = kcu.constraint_schema 
					and tc.constraint_catalog = kcu.constraint_catalog
				limit 1
			)) as constraint_types
		from information_schema.key_column_usage kcu
		group by kcu.table_schema, kcu.table_name, kcu.column_name
    )
select distinct c.table_schema = 'public' as is_public,
                    c.table_schema            as schema_name,
                c.table_name              as table_name,
                c.column_name             as column_name,
                    c.ordinal_position        as ordinal,
                case
                when i.constraint_types is null
                then false
                else 'PRIMARY KEY'=any (i.constraint_types)
                end                                    as is_pk,
                'FOREIGN KEY'=any (i.constraint_types) as is_fk,
                c.is_nullable = 'YES'                  as nullable,
                c.data_type = 'ARRAY'                  as is_array,
                coalesce(a.array_dims, 0)              as dims,
                case
                when e.is_enum = true
                then 'varchar'
                else ltrim(c.udt_name, '_')
                end                         as type,
                c.column_default            as def,
                    c.character_maximum_length  as len,
				e.enum_values 				as enum
from information_schema.tables t
left join information_schema.columns c using (table_name, table_schema)
left join info i using (table_name, table_schema, column_name)
left join arrays a using (table_name, table_schema, column_name)
left join enums e using (table_name, table_schema, column_name)
where
	 t.table_type = 'BASE TABLE'
order by 1 desc, 2, 3, 5 asc, 6 desc nulls last
    -- add (t.table_schema, t.table_name) filter if needed

I do think that it will easier to NOT rewrite those queries in ORM way since it is hard to maintain for bad ORM users (even being good at Raw SQL)
I face need of ESPECIALLY AddColumn/DropColumn/RenameColumn functions on my job. Case: we have table with columns id, some_name, some_field_a, created_at, deleted_at and we need to add some_field_b right before created_at, so it leads us to add column, add copy of audit field, drop old audit field, rename new audit fields - pretty much struggle for task.

Indices are least used by me in term of migrations (since in my job I always try to think about them only once and do not change them very often)
I believe automigrate tool must have option to turn off auto migration for provided tables (or even columns?) if it needed
Something must be done with partitions (which have INHERITS (schema_name.parent_table) in DDL). It is real pain for me sometimes when I work with another instruments (diesel-rs). May be some placeholders/regexps? I have no idea.

p.s. Recently I was impressed by this project https://github.com/kyleconroy/sqlc . Could be very cool to add something like this for ORM purposes based on SQL migrations files...but I guess this sounds like a new topic?

Sep 21 '22 11:09 LdDl

:+1: We need to start with something small and grow it up so it works at least for SQLite and Postgres which have a lot in common.

Most likely we should start by familiarizing ourselves with the API already provided by database/sql, e.g. https://pkg.go.dev/database/sql/driver#RowsColumnTypeDatabaseTypeName

p.s. Recently I was impressed by this project https://github.com/kyleconroy/sqlc . Could be very cool to add something like this for ORM purposes based on SQL migrations files...but I guess this sounds like a new topic?

Yeah, projects like sqlc and Ents open a whole new world of code generation for you :)

Sep 21 '22 12:09 vmihailenco

@vmihailenco I wasn't able to look into that yet because I had a lot of work in the past couple of months.
I am still very interested in developing this feature -- I will be taking some time off soon to work on it :v:

Nov 17 '22 15:11 bevzzz

@bevzzz take your time :)

Nov 19 '22 07:11 vmihailenco

Here are my thoughts on how this feature could be implemented. It felt like a lot of ground to cover, so if some concepts need more clarification/discussion, please let me know.

I am planning to follow the steps suggested above, starting with Postgres:

1. Improve Bun dialects with ability to inspect database schema

For this step I think we'll need the following:

package inspect

type Inspector interface {
	Tables() []*Table
	HasTable(table interface{}) bool
	HasColumn(table interface{}, column string) bool
}

// Not sure if there should be an interface for that, can't think of different implementations
type Table struct {
	ParentSchema string
	Name         string
	Columns      []Column
}

// Each dialect can implement Column to handle the specifics
type Column interface {
	Type() *sql.ColumnType
	
	// Is the database column equivalent to the column definition in the Go-model?
	// Do we need to alter this column in migration?
	Equivalent(schema.Field) bool

	// Additional metadata that's not available in sql.ColumnType
	IsPK() bool
	IsAutoincrement() bool
}

We should define separate Inspector and Migrator interfaces, as that would allow reusing the Inspector interface in contexts other than migration.
pgdialect (and other dialects) will implement Inspector and Column to handle specifics of each DBMS.
E.g. attributes such as isEnum, isArray arrayDims would be specific to Postgres.
Table can be extended to include indices and constraints.
sql.ColumnType contains a lot of information about table columns and is readily available in the standard lib's API. For existing tables, it can be accessed by db.NewSelect.Model(m).Limit(1).Rows().ColumnTypes(), which is how GORM does it (bun also uses ColumnTypes(), but for a different reason, see mapModel).
I would use what the standard lib provides us with and get additional metadata using the queries that @LdDl has so helpfully shared above (thanks a lot!).

2. Support `RenameColumn` and `ChangeColumnType`

Since Create/DropTable, Create/DropColumn, Create/DropIndex are already part of the IDB interface, I think the new methods for altering these objects would need to added to that interface for the sake of consistency. Doing an ALTER TABLE query is not auto-migration per se, so I wouldn't separate that into its own package.

package bun

type IDB interface {
	...
	NewAlterTable() *AlterTableQuery
	NewAlterIndex() *AlterIndexQuery
}

type AlterTableQuery struct {}

func (q *AlterTableQuery) RenameColumn(...) {}

func (q *AlterTableQuery) ChangeColumnType(...) {}

Some related research and thoughts that I would like to get your opinion on:

Renaming a column:
- Cannot be chained, i.e. every renaming requires a separate ALTER TABLE RENAME COLUMN query.
- After renaming a column, the user might want to rename the related index; I think we should provide a way to do that via a new AlterIndexQuery and leave maintaining consistent naming to the user.
- Postgres will update the column name in constraints, foreign keys, index definitions, etc.
Changing column type:
- Can be chained, i.e. renaming every column in a table can be done with a single query.
  Is that OK if we allow chaining ChangeColumnType() calls, while only doing one RenameColumn()?
- Should have USING clause.

Next steps: actual auto-migration

Despite how migration code is structured in GORM, I understand "auto-migration" as being this:

// Lives under migrate/auto
package auto

type Migrator interface {
    // SyncModels detects differences between Go-models and present database schema and resolves them
    SyncModels(ctx context.Context, models ...interface{})
}

Secondly, while this feature is useful in and of itself (for instance, to "initialize" the database on service start), the trouble I see with that is the "runtime" migration is that the history is not recorded and reverting it is not possible. So far my experience with auto-migrations was limited to Django's migrations, where running makemigrations results in a number of migration files (up/down) being generated.

Once we've got auto-migrations figured out, we could leverage the fact that all bun's queries implement Stringer and create "up" and "down" SQL-migration files for the user. Meanwhile, should these auto-migrations be somehow recorder or do we just treat them as being multiple ALTER/DROP/CREATE queries packaged in one method?

P.S.: I prefer SyncModels() over AutoMigrate() because it is more descriptive. Does that name sound fine to you, or do you think we should stick with the already-existing AutoMigrate?

On the side note:

I couldn't find a convenient mechanism in bun's API for specifying the schema to which a certain table belongs. Without it, it seems, our auto-migration would end up always creating new tables in the public schema (which is what dbfixture does).

It could be worked around with, e.g. BeforeCreateTableHook, but I feel like we should have a simpler solution (like adding a schema key in the bun: tag) for something so commonplace. What's your take on that? Should I create a separate issue for this feature?

Dec 02 '22 12:12 bevzzz

Started working on ALTER TABLE query, that should cover a lot of the functionality we're missing for auto-migration.
As suggested above, we can introduce it as a separate feature before moving on with this topic.

Each step can be done separately so we have a chance to send feedback at the earlier stage.

The way I see it, the scope of ALTER TABLE functionality that we will need to support is defined by what users can specify in the model definition:

[x] Rename column
[ ] Change column type ("autoincrement" belongs here)
[x] Rename table
[ ] Change table's PK
[ ] Drop column's default value
[ ] Change column's default value
[ ] Add/Drop NOT NULL constraint
[ ] Add/Drop UNIQUE check
[x] Add/Drop column (see below)

(Have I missed anything?*)

While it's not the first priority, I was thinking about the NewAddColumn() and NewDropColumn() methods that are already part of the bun.IDB API. Since bun is an SQL-first ORM, would it not make sense for those to be part of the AlterTableQuery rather than standalone statements? E.g.:

db.NewAlterTable().Model().AddColumn()  // ALTER TABLE model ADD COLUMN col
db.NewAlterTable().Model().DropColumn()  // ALTER TABLE model DROP COLUMN col

*There're also Table relations, but I haven' t gotten to that yet. I don't suppose bun creates FK-constraints when it sees a relationship defined between 2 models (or does it?). I will need to look at that later.

Dec 04 '22 22:12 bevzzz

Thanks for writing so many details :+1: I agree that we should start with adding a table inspector - perhaps just an ability to list columns in a table should be enough.

For this step I think we'll need the following:

Looks reasonable and it is hard to tell more without writing some code and getting your hands dirty. I would suggest to name the package sqlschema instead of inspect though.

Regarding the API, I expect to have something like:

import "github.com/uptrace/bun/sqlschema"

var db *bun.DB

inspector := sqlschema.NewInspector(db)

columns, err := inspector.Columns(ctx, tableName)
// err can be sqlschema.ErrNotExists if the table is missing
// columns is []sqlschema.Column

would it not make sense for those to be part of the AlterTableQuery rather than standalone statements? E.g.:

I think we need something like this to build SQL queries:

db.NewAlterTable().Model(...).AddColumn("").Opt1().Opt2().Exec(ctx)
db.NewAlterTable().Model(...).DropColumn("").Opt1().Opt2().Exec(ctx)

But it will get too complex very soon with RDMBS-specific features/options so we should also support something much simpler just for SyncModel:

var db *bun.DB

migrator := sqlschema.NewMigrator(db)

err := migrator.AddColumn(&somepkg.Column{
    Schema: "", // optional
    Table: "table_name",
    Column: "column_name",
    Type: ...,
})
err := migrator.DropColumn(col)
err := migrator.ChangeType(col, newType)

Underneath, it may use Bun API to generate the query or just generate and execute SQL directly.

It could be worked around with, e.g. BeforeCreateTableHook, but I feel like we should have a simpler solution (like adding a schema key in the bun: tag) for something so commonplace. What's your take on that? Should I create a separate issue for this feature?

It probably can be supported via a tag like you suggested bun:"schema:name", but I think we can safely ignore that detail for now. It should not affect the design.

Dec 05 '22 12:12 vmihailenco

I have created a draft PR (it is very much WIP) so that we could exchange ideas with specific code at hand. I'm also not sure if some of my code breaks the project's code style, so if it does, please let me know how I could change it.

It only includes ALTER TABLE functionality for now, just in case we want to ship that independently of auto-migration.
Once the remaining tasks here are finished and we agree on the API (it should accommodate the differences between the DBMS, and I haven't looked into that yet), I will add the documentation for this too.

Appreciate your suggestions a lot, and looking forward to hearing your feedback on the progress so-far!

Dec 06 '22 16:12 bevzzz

Just left a code review and some suggestions. Most of them are just thoughts from the top of my head so please be patient with me :)

And thanks for working on this!

Dec 08 '22 08:12 vmihailenco

Found this bug while working on it, so I guess I'll do a PR for that first, because I'd like to use some of the underlying functionality for ALTER TABLE too.

Dec 28 '22 10:12 bevzzz

I opened a PR which adds schema inspection and RenameTable auto-migration to Postgres dialect.

I took a look at how GORM, ent, and django handle this and tried to come up with a solution that fits bun's existing APIs.

The scope of changes we want to support is pretty much that what can be described in a bun.Model:

Create/Drop/Rename tables, move table to a different schema
Add/Drop/Rename columns, alter column type + its constraints (data type, PK, default, unique, nullable)
Change FK relations (Have I missed anything?)

The tricky part will be resolving dependencies between different migrations (e.g. create table before referencing it in an FK constraint).

Additionally, it would be nice to provide some options like:

Disable DROP (ent calls it "append-only" mode)
Control how FK constraints are named if the related entities are renamed
Control which tables are included in the database inspection (by default all)

Nov 01 '23 21:11 bevzzz

bun bun copied to clipboard

Add something like GORM's AutoMigrate

1. Improve Bun dialects with ability to inspect database schema

2. Support RenameColumn and ChangeColumnType

Next steps: actual auto-migration

bun
bun copied to clipboard

2. Support `RenameColumn` and `ChangeColumnType`