model-registry feat: create the embedmd datastore type

part of #1018

Description

Please note that EmbedMD is a temporary name and is up to debate. :sweat_smile:

In this PR, we add a new datastore type called EmbedMD. It is a simple, embedded metadata service that (in the future) will be used to store metadata for the Model Registry.

This PR includes all the migrations needed to recreate the MLMD database schema, this should make the new datastore retro-compatible with the MLMD database.

Two new flags have been added to the proxy command:

embedmd-database-type: The type of database to use for the EmbedMD datastore.
embedmd-database-dsn: The DSN for the EmbedMD database.

So for example, to run the proxy with an EmbedMD datastore, on a MySQL database, you can run the following command:

./model-registry proxy --hostname=localhost --port=8080 --datastore-type=embedmd --embedmd-database-dsn="root:test@tcp(127.0.0.1:3306)/metadb?charset=utf8mb4&parseTime=True&loc=Local" --embedmd-database-type=mysql

How Has This Been Tested?

I have tried in a local cluster two things:

deploy the usual model registry with MLMD, set the deployment replicas to 0 and run the command above. Migrations have been applied successfully, then set the deployment replicas to 1 and the model registry has been started successfully.
start with an empty database and run the command above. Migrations have been applied successfully and then use the new database with the usual model registry deployment with MLMD, it worked as expected.

also I added two new integration tests for the migrations (up and down) with the usual command:

make test

Merge criteria:

All the commits have been signed-off (To pass the DCO check)

[ ] The commits have meaningful messages; the author will squash them after approval or in case of manual merges will ask to merge with squash.
[ ] Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
[ ] The developer has manually tested the changes and verified that the changes work.
[ ] Code changes follow the kubeflow contribution guidelines.
[ ] For first time contributors: Please reach out to the Reviewers to ensure all tests are being run, ensuring the label ok-to-test has been added to the PR.

May 13 '25 18:05 Al-Pragliola

so is this like a temporary-migration holding period just to be off of MLMD?

May 15 '25 13:05 syntaxsdev

Why start with SQL directly, why not start with tagged ORM structs that we'll use with Gorm anyway?

May 15 '25 14:05 dhirajsb

so is this like a temporary-migration holding period just to be off of MLMD?

Hey @syntaxsdev,

This is more than a temporary solution. It's the second step towards getting off MLMD. The plan is as follows:

Create an abstraction layer for metadata stores (because the code was bound to MLMD). This has been done in #425
Create a new metadata store type (included in this PR)
Migrate an existing or empty DB to the same schema as MLMD for backwards compatibility reasons (included in this PR)
Replace MLMD's functionality by implementing the datastore interface using GORM
Test/document
Remove MLMD

May 15 '25 17:05 Al-Pragliola

Why start with SQL directly, why not start with tagged ORM structs that we'll use with Gorm anyway?

:wave: @dhirajsb,

Two reasons (in my opinion)

Decoupling: I wanted to keep the dependencies from the migration and ORM libraries to a minimum so that, if one or both fail to meet our expectations, they can be easily replaced independently of each other. Also, ORM structs should only reflect the currently used schema version. If we need to add columns to one of those tables in the future, we would need a struct for the older type in the older migration and a newer one for the current schema.
Control: I wanted more control over the migrations because it is often sensible to run them with direct SQL. This ensures that complex data changes are handled exactly right, and we can use any specific database command we need.

May 15 '25 17:05 Al-Pragliola

Thanks @Al-Pragliola for the detailed response. See comments below:

Decoupling: I wanted to keep the dependencies from the migration and ORM libraries to a minimum so that, if one or both fail to meet our expectations, they can be easily replaced independently of each other.

That's what the DB interface is for, to abstract the DB implementation. We don't need to further abstract out the ORM layer and feels like a YAGNI situation.

Also, ORM structs should only reflect the currently used schema version. If we need to add columns to one of those tables in the future, we would need a struct for the older type in the older migration and a newer one for the current schema.

That is sort of correct. The reason for using the ORM is to avoid writing DB specific DDL/SQL to make it as portable as possible to all Gorm supported DB types. We can actually handle schema changes with older and updated structs without much overhead. See this code example for a RH control plane. There is a starting DB schema, which is in sync with the main DB schema to start. Only changes for updated types need to be in migrations from that point on. We can also use raw SQL if needed. IMHO, since the mlmd DB schema is so generic, there shouldn't :crossed_fingers: be that many changes to the schema in the future.

Control: I wanted more control over the migrations because it is often sensible to run them with direct SQL. This ensures that complex data changes are handled exactly right, and we can use any specific database command we need.

As shown in the sample migrations strategy above, we still retain control over types and have access to raw SQL if needed. So, we aren't necessarily sacrificing anything by using Gorm types to create the initial schema.

May 16 '25 05:05 dhirajsb

BTW, the sample migration implementation I shared was for a continuously released SaaS. So, we had to be able to turn around changes asap if we ran into bugs/feature fixes, etc.

May 16 '25 06:05 dhirajsb

Thanks for your detailed feedback and for sharing the example, @dhirajsb ! I appreciate you taking the time to explain your perspective on using GORM for schema management.

You've made some good points about GORM's capabilities for portability and handling schema evolution with structs. My initial reasoning for choosing golang-migrate with SQL files stemmed from a few key considerations, and perhaps I can clarify how they relate to the lifecycle of migrations themselves.

One of the primary motivations for using a dedicated tool like golang-migrate was to leverage its robust, out-of-the-box functionalities for managing the entire migration process. This includes critical aspects like:

Database Locking: golang-migrate drivers (e.g., for MySQL, PostgreSQL) typically implement database-level locking. This ensures that if multiple instances of our application (like multiple pods in Kubernetes) try to run migrations simultaneously at startup, only one proceeds, preventing race conditions and ensuring migrations apply cleanly and sequentially. (addressing https://github.com/kubeflow/model-registry/pull/1107#discussion_r2087482000)

From Gorminate README

Who is Gormigrate for? Gormigrate was born to be a simple and minimalistic migration tool for small projects that uses Gorm. You may want to take a look at more advanced solutions like golang-migrate/migrate if you plan to scale. Be aware that Gormigrate has no builtin lock mechanism, so if you're running it automatically and have a distributed setup (i.e. more than one executable running at the same time), you might want to use a distributed lock/mutex mechanism to prevent race conditions while running migrations.

Atomic Version Tracking: It maintains a dedicated schema_migrations table to reliably track which versioned migrations have been applied, and whether the database is in a "dirty" state.
Explicit Up and Down Migrations: For every schema change (.up.sql), there's a corresponding .down.sql script. This provides clear, explicit, and testable rollback paths for each versioned change, which is invaluable for operational stability.

And mainly because it's the tool the team approved to use in RHOAIENG-22102

The decoupling aspect I mentioned earlier also ties into this: golang-migrate handles the "how and when" of migration execution (locking, versioning, up/down), while the SQL (or even GORM DDL within a Go migration called by such a system) defines the "what." If the "how and when" is tied too closely to the ORM's schema definition tools, it feels like we might lose some of this specialized process machinery.

I'm definitely open to discussing this further and seeing if there's an approach that gives us the best of both worlds! Perhaps there's a way to leverage GORM for defining the schema state within a migration step while still using golang-migrate's core engine for the overall process management.

But in my initial research I didn't find an easy way to integrate the two tools without major rewriting of golang-migrate functionalities.

Thanks again for the discussion! :wave:

May 16 '25 12:05 Al-Pragliola

+1 for keeping both DDL and GORM decoupled. For me,

It separates concerns effectively.
The schema is explicit and easy to understand, especially from a management perspective. No need to read the code to understand what the schema is.
If we need to make any database-specific changes for performance or maintainability, this approach makes it easier to do so.
While we're currently using Go, if we ever need to switch to Python or support Python libraries directly (as MLMD did), this design will allow us to adapt quickly.

May 18 '25 13:05 rareddy

If the team has made the final decision on going with golang migrate and using direct DDL, that's fine by me. I do have some questions:

Are we still able to support different DB types like Gorm does, i.e. MySQL, PostgreSQL, SQLite, SQL Server, or is the DDL tied to a DB dialect?
Is there a way we can check whether the Gorm structs that we'll add later are in sync with the current schema from go-migrate?

May 19 '25 05:05 dhirajsb

Thanks for these excellent follow-up questions, @dhirajsb . They touch upon important aspects of choosing a migration strategy. Let me try to address them:

Are we still able to support different DB types like Gorm does, i.e. MySQL, PostgreSQL, SQLite, SQL Server, or is the DDL tied to a DB dialect?

You're right to point out GORM's strength in generating DDL that's portable across the database types it supports. With the current approach of using direct DDL in SQL files with golang-migrate, the migration scripts are indeed tied to a specific SQL dialect, but the code is ready to expand on different DB types as there's already a layer of abstraction on top of it. Here's the list of DB types supported by golang-migrate:

PostgreSQL
PGX v4
PGX v5
Redshift
Ql
Cassandra / ScyllaDB
SQLite
SQLite3
SQLCipher
MySQL / MariaDB
Neo4j
MongoDB
CrateDB
Shell
Google Cloud Spanner
CockroachDB
YugabyteDB
ClickHouse
Firebird
MS SQL Server
rqlite

Is there a way we can check whether the Gorm structs that we'll add later are in sync with the current schema from go-migrate?

A key aspect that golang-migrate brings to the table here is explicit schema versioning. It establishes a versioned "source of truth" for the database schema through its migration history, which is tracked in the schema_migrations table. Our GORM structs are then intended to reflect this specific, versioned state of the schema (typically the latest version deployed with the application).

We can use https://gorm.io/gen/database_to_structs.html, as we do with OpenAPI, to generate the types and methods of the REST API. We can use the DB with schema version X as a source of truth to generate the GORM structs. If this fails to meet our expectations, we can write the structs by hand and build a solid unit/integration test suite to ensure everything is in sync.

May 19 '25 09:05 Al-Pragliola

With the current approach of using direct DDL in SQL files with golang-migrate, the migration scripts are indeed tied to a specific SQL dialect, but the code is ready to expand on different DB types as there's already a layer of abstraction on top of it. Here's the list of DB types supported by golang-migrate:

impressive list; just noting from a Requirement perspective we're being asked to support Sqlite, Postgresql, MySQL/MariaDB.

May 19 '25 10:05 tarilabs

we're being asked to support Sqlite, Postgresql, MySQL/MariaDB.

Yes

the migration scripts are indeed tied to a specific SQL dialect, but the code is ready to expand on different DB types as there's already a layer of abstraction on top of it.

I am confused about this abstraction. Can you share any details about it, any ADR doc etc, please, to learn more about it? Or an example. The immediate question I have is whether we maintain separate DDL scripts for each database or a common ANSI and then vendor-specific extensions? On the DML side, I am less concerned, as most can be ANSI.

May 19 '25 15:05 rareddy

I am confused about this abstraction. Can you share any details about it, any ADR doc etc, please, to learn more about it? Or an example. The immediate question I have is whether we maintain separate DDL scripts for each database or a common ANSI and then vendor-specific extensions? On the DML side, I am less concerned, as most can be ANSI.

@rareddy

It's nothing fancy, the short answer is yes. We maintain separate DDL/DML scripts for the database types as you can see in the folder structure

I made the DBMigrator an interface

type DBMigrator interface {
	Migrate() error
	Up(steps *int) error
	Down(steps *int) error
}

and the concrete types that implements it

type MySQLMigrator struct {
	migrator *migrate.Migrate
}

type PostgreSQLMigrator struct {
	migrator *migrate.Migrate
}

With a factory method

func NewDBMigrator(dbType string, db *gorm.DB) (DBMigrator, error) {
	switch dbType {
	case "mysql":
		return mysql.NewMySQLMigrator(db)
	case "postgresql":
		return postgresql.NewPostgreSQLMigrator(db)
	}

	return nil, fmt.Errorf("unsupported database type: %s", dbType)
}

Let me know if you need more information. I can write a formal document with a diagram and a more in-depth explanation

May 19 '25 16:05 Al-Pragliola

Thank you for the explanation @Al-Pragliola, that makes sense. so one needs to keep all supported database schemas in sync. Also is API --> DML also per database?

May 19 '25 20:05 rareddy

Thanks for the detailed response @Al-Pragliola

We can use https://gorm.io/gen/database_to_structs.html, as we do with OpenAPI, to generate the types and methods of the REST API. We can use the DB with schema version X as a source of truth to generate the GORM structs. If this fails to meet our expectations, we can write the structs by hand and build a solid unit/integration test suite to ensure everything is in sync.

Yeah, if we can either generate the orm schema or validate it, we should be good! :+1:

May 19 '25 23:05 dhirajsb

Thank you for the explanation @Al-Pragliola, that makes sense. so one needs to keep all supported database schemas in sync. Also is API --> DML also per database?

@rareddy For API --> DML this is not necessary, as we leverage GORM's capabilities to handle different sql-dialects.

I created yet another interface, with concrete types and factory method to connect to the different database types, but all of them expose a gorm.DB type, this type has database-independent methods that we will use for the API --> DML logics

type Connector interface {
	Connect() (*gorm.DB, error)
	DB() *gorm.DB
}

type MySQLDBConnector struct {
	DSN string
	db  *gorm.DB
}

func NewMySQLDBConnector(dsn string) *MySQLDBConnector {
	return &MySQLDBConnector{
		DSN: dsn,
	}
}

func NewConnector(dbType string, dsn string) (Connector, error) {
	switch dbType {
	case "mysql":
		return mysql.NewMySQLDBConnector(dsn), nil
	}

	return nil, fmt.Errorf("unsupported database type: %s", dbType)
}

May 20 '25 13:05 Al-Pragliola

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rareddy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rareddy]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

May 20 '25 16:05 google-oss-prow[bot]