bun
bun copied to clipboard
Add something like GORM's AutoMigrate
First, we should provide some unified way to inspect current database schema via using INFORMATION_SCHEMA.TABLES
or something similar. Then we can sync the database by comparing the actual database schema with available Go models.
+1
@vmihailenco I've been reading through the issues and noticed this one had help wanted
on it. I only started to work with bun
recently, but this seems like something I could try to do. Would you like some help with it?
@bevzzz The help is welcome here, but you should be warned that this is not a small issue and it will probably require several weeks of work.
If you decide to work on this, my advice would be to learn how GORM AutoMigrate works and what we can learn from it. Writing your thoughts here so others can help would be the next step.
I did not have a chance to learn from GORM AutoMigrate yet, but I guess the plan should look like this:
-
Improve Bun dialects with ability to inspect database schema, for example, pgdialect should be capable to return a list of tables with all columns. I would start with a single PostgreSQL dialect for now.
-
Decide which operations we want to support, for example, we can start with 2 operations:
package pgdialect
func (Dialect) RenameColumn(...)
func (Dialect) ChangeColumnType(...)
- Use the database schema (step 1) to detect changes and apply commands from step 2 to auto-migrate the schema. Don't get too ambitious and only start with a single operation, for example, renaming a column.
Each step can be done separately so we have a chance to send feedback at the earlier stage.
Also pinging @dizzyfool and @LdDl who done some excellent work on https://github.com/dizzyfool/genna and https://github.com/LdDl/bungen and might be interested to help designing the API for step 1.
@vmihailenco Thanks for getting back in touch so quickly!
Yes, you're right, bringing this feature will take some time and research, so I really appreciate the steps that you've suggested.
I will familiarize myself with the way it's done in GORM and get back with a proposed solution for the 2 operations you've mentioned.
P.S. I plan to start working on it on the next weekend, I guess we are not in a rush here, right?
P.S. I plan to start working on it on the next weekend, I guess we are not in a rush here, right?
No rush for sure. Even making some progress on step 1) will make a change in the long run.
TLDR: just bunch of SQL queries to extract reference information about Postgres tables and my first-look thoughts
My POV:
-
Reasons to start with PostgreSQL
- SQLite will be tough since it has not sugar-ish syntax to
ALTER
columns. Hard copy should be done, something like this:
create a_copy(...some new columns...old columns) insert into a_copy from a (everything, but skip new columns) drop a rename a_copy -> a
- Not sure about MySQL, I do not remember if it allows to alter columns. It could be done in same way as with SQLite I guess (with almost same pseudocode, but for columns);
- MSSQL - I haven't touch it for a thousand years;
- PostgreSQL - I guess there are not obvious difficulties, aren't there?
- SQLite will be tough since it has not sugar-ish syntax to
-
Straightforward SQL queries to extract PostgreSQL schemas/tables information:
- Basic query to extract all tables with theirs parent scherma
select table_schema, table_name from information_schema.tables where table_type = 'BASE TABLE' -- Ignore table_type == 'VIEW' -- Add (table_schema, table_name) or table_name filters if needed
- Query to extract relations between tables. Each relation is represented by
constraint
(foreigh key basically), source schema.table, set of source columns (which do reference), target schema.table, set of target columns (which are referenced). Sets (both source and target) should be properly aggregated - I believearray_agg
function could do the job without additionalorder by
call inside of this function.
with schemas as ( select nspname, oid from pg_namespace ), tables as ( select oid, relnamespace, relname, relkind from pg_class ), columns as ( select attrelid, attname, attnum from pg_attribute a where a.attisdropped = false ) select distinct co.conname as constraint_name, ss.nspname as schema_name, s.relname as table_name, array_agg(sc.attname) as columns, ts.nspname as target_schema, t.relname as target_table, array_agg(tc.attname) as target_columns from pg_constraint co left join tables s on co.conrelid = s.oid left join schemas ss on s.relnamespace = ss.oid left join columns sc on s.oid = sc.attrelid and sc.attnum = any (co.conkey) left join tables t on co.confrelid = t.oid left join schemas ts on t.relnamespace = ts.oid left join columns tc on t.oid = tc.attrelid and tc.attnum = any (co.confkey) where co.contype = 'f' and co.conrelid in (select oid from pg_class c where c.relkind = 'r') and array_position(co.conkey, sc.attnum) = array_position(co.confkey, tc.attnum) group by constraint_name, schema_name, table_name, target_schema, target_table; -- Add (ss.nspname, s.relname) filter if needed
-
And most not obvious part (in terms of SQL query): columns.
Each column is represented by: schema location (either 'public' or not), table_name, column name, column type (in string representation), position of column in table (note: indexing starts with 1), is column part of pk?, is column fk?, is column could be null?, default value for column (in string representation) if it has been provided, size of varchar (it is null for columns of non-text type), is column of enum type? Note: I'm not that postgres enum type guy, so I'm not sure if
enum_values
is correct way to get needed information;
with enums as ( select distinct true as is_enum, sch.nspname as table_schema, tb.relname as table_name, col.attname as column_name, array_agg(e.enumlabel) as enum_values from pg_class tb left join pg_namespace sch on sch.oid = tb.relnamespace left join pg_attribute col on col.attrelid = tb.oid inner join pg_enum e on e.enumtypid = col.atttypid group by 1, 2, 3, 4 ), arrays as ( select sch.nspname as table_schema, tb.relname as table_name, col.attname as column_name, col.attndims as array_dims from pg_class tb left join pg_namespace sch on sch.oid = tb.relnamespace left join pg_attribute col on col.attrelid = tb.oid where col.attndims > 0 ), info as ( select distinct kcu.table_schema as table_schema, kcu.table_name as table_name, kcu.column_name as column_name, array_agg(( select constraint_type::text from information_schema.table_constraints tc where tc.constraint_name = kcu.constraint_name and tc.constraint_schema = kcu.constraint_schema and tc.constraint_catalog = kcu.constraint_catalog limit 1 )) as constraint_types from information_schema.key_column_usage kcu group by kcu.table_schema, kcu.table_name, kcu.column_name ) select distinct c.table_schema = 'public' as is_public, c.table_schema as schema_name, c.table_name as table_name, c.column_name as column_name, c.ordinal_position as ordinal, case when i.constraint_types is null then false else 'PRIMARY KEY'=any (i.constraint_types) end as is_pk, 'FOREIGN KEY'=any (i.constraint_types) as is_fk, c.is_nullable = 'YES' as nullable, c.data_type = 'ARRAY' as is_array, coalesce(a.array_dims, 0) as dims, case when e.is_enum = true then 'varchar' else ltrim(c.udt_name, '_') end as type, c.column_default as def, c.character_maximum_length as len, e.enum_values as enum from information_schema.tables t left join information_schema.columns c using (table_name, table_schema) left join info i using (table_name, table_schema, column_name) left join arrays a using (table_name, table_schema, column_name) left join enums e using (table_name, table_schema, column_name) where t.table_type = 'BASE TABLE' order by 1 desc, 2, 3, 5 asc, 6 desc nulls last -- add (t.table_schema, t.table_name) filter if needed
-
I do think that it will easier to NOT rewrite those queries in ORM way since it is hard to maintain for bad ORM users (even being good at Raw SQL)
-
I face need of ESPECIALLY AddColumn/DropColumn/RenameColumn functions on my job. Case: we have table with columns
id, some_name, some_field_a, created_at, deleted_at
and we need to addsome_field_b
right beforecreated_at
, so it leads us to add column, add copy of audit field, drop old audit field, rename new audit fields - pretty much struggle for task.Indices are least used by me in term of migrations (since in my job I always try to think about them only once and do not change them very often)
-
I believe automigrate tool must have option to turn off auto migration for provided tables (or even columns?) if it needed
-
Something must be done with partitions (which have
INHERITS (schema_name.parent_table)
in DDL). It is real pain for me sometimes when I work with another instruments (diesel-rs). May be some placeholders/regexps? I have no idea.
p.s. Recently I was impressed by this project https://github.com/kyleconroy/sqlc . Could be very cool to add something like this for ORM purposes based on SQL migrations files...but I guess this sounds like a new topic?
:+1: We need to start with something small and grow it up so it works at least for SQLite and Postgres which have a lot in common.
Most likely we should start by familiarizing ourselves with the API already provided by database/sql, e.g. https://pkg.go.dev/database/sql/driver#RowsColumnTypeDatabaseTypeName
p.s. Recently I was impressed by this project https://github.com/kyleconroy/sqlc . Could be very cool to add something like this for ORM purposes based on SQL migrations files...but I guess this sounds like a new topic?
Yeah, projects like sqlc and Ents open a whole new world of code generation for you :)
@vmihailenco I wasn't able to look into that yet because I had a lot of work in the past couple of months.
I am still very interested in developing this feature -- I will be taking some time off soon to work on it :v:
@bevzzz take your time :)
Here are my thoughts on how this feature could be implemented. It felt like a lot of ground to cover, so if some concepts need more clarification/discussion, please let me know.
I am planning to follow the steps suggested above, starting with Postgres:
1. Improve Bun dialects with ability to inspect database schema
For this step I think we'll need the following:
package inspect
type Inspector interface {
Tables() []*Table
HasTable(table interface{}) bool
HasColumn(table interface{}, column string) bool
}
// Not sure if there should be an interface for that, can't think of different implementations
type Table struct {
ParentSchema string
Name string
Columns []Column
}
// Each dialect can implement Column to handle the specifics
type Column interface {
Type() *sql.ColumnType
// Is the database column equivalent to the column definition in the Go-model?
// Do we need to alter this column in migration?
Equivalent(schema.Field) bool
// Additional metadata that's not available in sql.ColumnType
IsPK() bool
IsAutoincrement() bool
}
-
We should define separate
Inspector
andMigrator
interfaces, as that would allow reusing the Inspector interface in contexts other than migration. -
pgdialect
(and other dialects) will implementInspector
andColumn
to handle specifics of each DBMS.
E.g. attributes such asisEnum
,isArray
arrayDims
would be specific to Postgres. -
Table
can be extended to include indices and constraints. -
sql.ColumnType
contains a lot of information about table columns and is readily available in the standard lib's API. For existing tables, it can be accessed bydb.NewSelect.Model(m).Limit(1).Rows().ColumnTypes()
, which is how GORM does it (bun
also usesColumnTypes()
, but for a different reason, seemapModel
).
I would use what the standard lib provides us with and get additional metadata using the queries that @LdDl has so helpfully shared above (thanks a lot!).
2. Support RenameColumn
and ChangeColumnType
Since Create/DropTable
, Create/DropColumn
, Create/DropIndex
are already part of the IDB
interface, I think the new methods for altering these objects would need to added to that interface for the sake of consistency. Doing an ALTER TABLE
query is not auto-migration per se, so I wouldn't separate that into its own package.
package bun
type IDB interface {
...
NewAlterTable() *AlterTableQuery
NewAlterIndex() *AlterIndexQuery
}
type AlterTableQuery struct {}
func (q *AlterTableQuery) RenameColumn(...) {}
func (q *AlterTableQuery) ChangeColumnType(...) {}
Some related research and thoughts that I would like to get your opinion on:
- Renaming a column:
- Cannot be chained, i.e. every renaming requires a separate
ALTER TABLE RENAME COLUMN
query. - After renaming a column, the user might want to rename the related index; I think we should provide a way to do that
via a new
AlterIndexQuery
and leave maintaining consistent naming to the user. - Postgres will update the column name in constraints, foreign keys, index definitions, etc.
- Cannot be chained, i.e. every renaming requires a separate
- Changing column type:
- Can be chained, i.e. renaming every column in a table can be done with a single query.
Is that OK if we allow chainingChangeColumnType()
calls, while only doing oneRenameColumn()
? - Should have
USING
clause.
- Can be chained, i.e. renaming every column in a table can be done with a single query.
Next steps: actual auto-migration
Despite how migration code is structured in GORM, I understand "auto-migration" as being this:
// Lives under migrate/auto
package auto
type Migrator interface {
// SyncModels detects differences between Go-models and present database schema and resolves them
SyncModels(ctx context.Context, models ...interface{})
}
Secondly, while this feature is useful in and of itself (for instance, to "initialize" the database on service start), the trouble I see with that is the "runtime" migration is that the history is not recorded and reverting it is not possible. So far my experience with auto-migrations was limited to Django's migrations, where running makemigrations
results in a number of migration files (up/down) being generated.
Once we've got auto-migrations figured out, we could leverage the fact that all bun's queries implement Stringer
and create "up" and "down" SQL-migration files for the user.
Meanwhile, should these auto-migrations be somehow recorder or do we just treat them as being multiple ALTER/DROP/CREATE
queries packaged in one method?
P.S.: I prefer SyncModels()
over AutoMigrate()
because it is more descriptive. Does that name sound fine to you,
or do you think we should stick with the already-existing AutoMigrate?
On the side note:
I couldn't find a convenient mechanism in bun's API for specifying the schema to which a certain table belongs. Without it, it seems, our auto-migration would end up always creating new tables in the public
schema (which is what dbfixture
does).
It could be worked around with, e.g. BeforeCreateTableHook
, but I feel like we should have a simpler solution (like adding a schema
key in the bun:
tag) for something so commonplace. What's your take on that? Should I create a separate issue for this feature?
Started working on ALTER TABLE
query, that should cover a lot of the functionality we're missing for auto-migration.
As suggested above, we can introduce it as a separate feature before moving on with this topic.
Each step can be done separately so we have a chance to send feedback at the earlier stage.
The way I see it, the scope of ALTER TABLE
functionality that we will need to support is defined by what users can specify in the model definition:
- [x] Rename column
- [ ] Change column type ("autoincrement" belongs here)
- [x] Rename table
- [ ] Change table's PK
- [ ] Drop column's default value
- [ ] Change column's default value
- [ ] Add/Drop
NOT NULL
constraint - [ ] Add/Drop
UNIQUE
check - [x] Add/Drop column (see below)
(Have I missed anything?*)
While it's not the first priority, I was thinking about the NewAddColumn()
and NewDropColumn()
methods that are already part of the bun.IDB
API. Since bun
is an SQL-first ORM, would it not make sense for those to be part of the AlterTableQuery
rather than standalone statements? E.g.:
db.NewAlterTable().Model().AddColumn() // ALTER TABLE model ADD COLUMN col
db.NewAlterTable().Model().DropColumn() // ALTER TABLE model DROP COLUMN col
*There're also Table relations, but I haven' t gotten to that yet. I don't suppose bun
creates FK-constraints when it sees a relationship defined between 2 models (or does it?). I will need to look at that later.
Thanks for writing so many details :+1: I agree that we should start with adding a table inspector - perhaps just an ability to list columns in a table should be enough.
For this step I think we'll need the following:
Looks reasonable and it is hard to tell more without writing some code and getting your hands dirty. I would suggest to name the package sqlschema
instead of inspect
though.
Regarding the API, I expect to have something like:
import "github.com/uptrace/bun/sqlschema"
var db *bun.DB
inspector := sqlschema.NewInspector(db)
columns, err := inspector.Columns(ctx, tableName)
// err can be sqlschema.ErrNotExists if the table is missing
// columns is []sqlschema.Column
would it not make sense for those to be part of the AlterTableQuery rather than standalone statements? E.g.:
I think we need something like this to build SQL queries:
db.NewAlterTable().Model(...).AddColumn("").Opt1().Opt2().Exec(ctx)
db.NewAlterTable().Model(...).DropColumn("").Opt1().Opt2().Exec(ctx)
But it will get too complex very soon with RDMBS-specific features/options so we should also support something much simpler just for SyncModel
:
var db *bun.DB
migrator := sqlschema.NewMigrator(db)
err := migrator.AddColumn(&somepkg.Column{
Schema: "", // optional
Table: "table_name",
Column: "column_name",
Type: ...,
})
err := migrator.DropColumn(col)
err := migrator.ChangeType(col, newType)
Underneath, it may use Bun API to generate the query or just generate and execute SQL directly.
It could be worked around with, e.g. BeforeCreateTableHook, but I feel like we should have a simpler solution (like adding a schema key in the bun: tag) for something so commonplace. What's your take on that? Should I create a separate issue for this feature?
It probably can be supported via a tag like you suggested bun:"schema:name"
, but I think we can safely ignore that detail for now. It should not affect the design.
I have created a draft PR (it is very much WIP) so that we could exchange ideas with specific code at hand. I'm also not sure if some of my code breaks the project's code style, so if it does, please let me know how I could change it.
It only includes ALTER TABLE
functionality for now, just in case we want to ship that independently of auto-migration.
Once the remaining tasks here are finished and we agree on the API (it should accommodate the differences between the DBMS, and I haven't looked into that yet), I will add the documentation for this too.
Appreciate your suggestions a lot, and looking forward to hearing your feedback on the progress so-far!
Just left a code review and some suggestions. Most of them are just thoughts from the top of my head so please be patient with me :)
And thanks for working on this!
Found this bug while working on it, so I guess I'll do a PR for that first, because I'd like to use some of the underlying functionality for ALTER TABLE
too.
I opened a PR which adds schema inspection and RenameTable
auto-migration to Postgres dialect.
I took a look at how GORM, ent, and django handle this and tried to come up with a solution that fits bun's existing APIs.
The scope of changes we want to support is pretty much that what can be described in a bun.Model:
- Create/Drop/Rename tables, move table to a different schema
- Add/Drop/Rename columns, alter column type + its constraints (data type, PK, default, unique, nullable)
- Change FK relations (Have I missed anything?)
The tricky part will be resolving dependencies between different migrations (e.g. create table before referencing it in an FK constraint).
Additionally, it would be nice to provide some options like:
- Disable DROP (ent calls it "append-only" mode)
- Control how FK constraints are named if the related entities are renamed
- Control which tables are included in the database inspection (by default all)