kysely
kysely copied to clipboard
Introspection with Multiple Schemas in Mysql
Hi, I am really liking this library - thank you to all the supporters.
I am adding Kysely to an existing project. The database is MySQL and the tables have been organized into several different schemas/databases in MySQL. I am using kysely-codegen
to generate the db.d.ts
file.
I have not been able to get kysely-codegen
to generate the types for all the different schemas. It generates only the types for the database specified in the DATABASE_URL. If I remove the database name from the DATABASE_URL, it generates no types.
I traced this back to the MysqlIntrospector
class inside kysely
.
export class MysqlIntrospector implements DatabaseIntrospector {
readonly #db: Kysely<any>
...
async getTables(
options: DatabaseMetadataOptions = { withInternalKyselyTables: false }
): Promise<TableMetadata[]> {
let query = this.#db
.selectFrom('information_schema.columns as columns')
.innerJoin('information_schema.tables as tables', (b) =>
b
.onRef('columns.TABLE_CATALOG', '=', 'tables.TABLE_CATALOG')
.onRef('columns.TABLE_SCHEMA', '=', 'tables.TABLE_SCHEMA')
.onRef('columns.TABLE_NAME', '=', 'tables.TABLE_NAME')
)
.select([
'columns.COLUMN_NAME',
'columns.COLUMN_DEFAULT',
'columns.TABLE_NAME',
'columns.TABLE_SCHEMA',
'tables.TABLE_TYPE',
'columns.IS_NULLABLE',
'columns.DATA_TYPE',
'columns.EXTRA',
])
---> .where('columns.TABLE_SCHEMA', '=', sql`database()`)
.orderBy('columns.TABLE_NAME')
.orderBy('columns.ORDINAL_POSITION')
.$castTo<RawColumnMetadata>()
On line 54 of mysql-introspector.ts
, the where
statement, limits the tables to the current database .where('columns.TABLE_SCHEMA', '=', sql'database()')
.
For testing purposes, I removed the where clause in my local node_modules
and re-ran kysely-codegen
, it generated the types for all the schemas! I used the --exclude-pattern
command line arg to exclude the MySQL built in schemas.
For info, I am using this to export this command to export the types:
kysely-codegen --dialect mysql --schema --exclude-pattern="{mysql,information_schema,performance_schema,sys,test,temp,tmp}.*" --camel-case --out-file "./db.d.ts"
What would be the best way to solve this? I want to get input on what the best solution would be. I am happy to submit a PR for this. I can see several options, but I am not sure which way to go.
-
Remove the where clause,
getTables()
returns all tables in the server. This could introduce bugs to existing code. -
Add an option to
DatabaseMetadataOptions
to specify which schema(s) to get the tables for. The option could be a string or an array of strings. and the SQL would becolumns.TABLE_SCHEMA in(?)
or something similar. -
Add an option to
DatabaseMetadataOptions
to specify to return all schema tables. If this option was true, it would not limit the tables to a specific schema
I think one or both of the last two options would work. Thanks
Hey 👋
Thank you for the detailed issue. 🙏 And for offering to submit a PR! let's wait for greenlight on this one.
Given the following:
CREATE SCHEMA is a synonym for CREATE DATABASE. https://dev.mysql.com/doc/refman/8.0/en/create-database.html
I think we shouldn't treat MySQL differently than PostgreSQL when it comes to schemas. Since kysely-codegen is the only codegen option for MySQL outside of Prisma, and we can't expect it to align quickly, need to make sure we don't break interop.
In an ideal world, we should do:
- remove
= database()
from where clause. - add
tables.TABLE_TYPE in ('BASE_TABLE', 'VIEW')
to where clause.
TABLE_TYPE BASE TABLE for a table, VIEW for a view, or SYSTEM VIEW for an INFORMATION_SCHEMA table. The TABLES table does not list TEMPORARY tables. https://dev.mysql.com/doc/refman/8.0/en/information-schema-tables-table.html
I don't think we need any exclusion options given the described steps - it's in parity with PostgreSQL's introspector.
What kind of bugs would this introduce to existing codebases?
We could offer an immediately deprecated "current database only" type flag that adds = database()
to the where
clause to give consumers some time to migrate if required.
@koskimas WDYT?
We originally didn't have the = database()
filter but we added it based on user request(s). I don't think people treat MySQL schemas the same way as Postgres schemas. Especially locally, people might have all their projects' databases in a single MySQL server. On Postgres schemas live under a single database and are always a part of that database.
Hi @igalklebanov and @koskimas,
Thank you for your insights. I concur that in many MySQL projects, all database objects are typically housed in one schema or database. However, in the context of this project, there's a unique organization across about seven databases or schemas.
Removing the = database()
filter could introduce regressions, as all tables/views across all databases—including MySQL's built-in ones—might be returned. We need to strike a balance between flexibility and backward compatibility.
While kysely-codegen
offers the flexibility of including or excluding patterns, not every use case may involve its introspection:
bash kysely-codegen --dialect mysql --schema --exclude-pattern="{mysql,information_schema,performance_schema,sys,test,temp,tmp}.*" --camel-case --out-file "./db.d.ts"
A few considerations and solutions:
-
Intention of
getTables()
:- Is the goal to retrieve all tables in a database or across the server? Depending on this, we might need to reevaluate the use of
= database()
. What does this do on Postgres for example? Should we make it consistent?
- Is the goal to retrieve all tables in a database or across the server? Depending on this, we might need to reevaluate the use of
-
Exclude Built-in Databases:
- We could modify the
= database()
filter tonot in('mysql','information_schema','performance_schema','sys','test','temp','tmp')
to return user-created tables/views server-wide.
- We could modify the
-
Flexible
DatabaseMetadataOptions
:- To prevent regression, we could retain
= database()
as the default. However, we can enhance the flexibility ofDatabaseMetadataOptions
by introducing an option or options. I can see three needs:- default: keeps existing
= database()
filter - Option to have no filter.
- Option to pass an array: it filters by the given schemas using
in ('array0', 'array1', 'arrayn')
.
- default: keeps existing
- To prevent regression, we could retain
-
Filter by Table Type:
- I agree that adding the
tables.TABLE_TYPE in ('BASE_TABLE', 'VIEW')
filter would be beneficial.
- I agree that adding the
Thanks again for discussing this.