ragtime
ragtime copied to clipboard
Optimize performance of `ragtime.next-jdbc/ensure-migrations-table-exists`
Intro
ragtime.next-jdbc
executes ensure-migrations-table-exists
on each fn in ragtime.protocols/DataStore
:
-
add-migration-id
-
remove-migration-id
-
applied-migration-ids
Problem
ensure-migrations-table-exists
is implemented by looking up tables in database metadata (code). The problem is that this is not a constant time operation. E.g. in Postgres, a JDBC driver executes an SQL (code) to retrieve the necessary information.
The problem usually manifests if one implements multi-tenancy by using separate schemas for each customer. E.g. suppose 1000 customers, each with 100 tables in a dedicated schema. That's how many tables the JDBC driver needs to fetch only to find out if customer_a.ragtime_migration
table exists or not. The query can easily take 200+ ms to execute. When running migrations for each customer, even the basic check if migrations need to run at all (applied-migration-ids
) can take a long time to execute (sequentially).
Solutions
See below for options. They should all be backwards compatible. I can create a PR to solve this issue. However, there are multiple ways to solve this. Which option would you prefer?
Option A: allow skipping the check
Add option to ragtime.next-jdbc/SqlDatabase
to skip executing ensure-migrations-table-exists
. In addition, expose a new public function that creates the table. So the library consumer library will have to create the table at their convenience first. This might also make sense if some other department (e.g. IT) needs to execute DDL SQLs. Or if one needs a special role for executing DDL vs. DML SQLs.
Option B: allow overriding the CREATE TABLE SQL
Add option to ragtime.next-jdbc/SqlDatabase
to execute an SQL different from the one in the source code. E.g. if one were to execute CREATE TABLE IF NOT EXISTS ...
then there is no need for table-exists?
.
Option C: add IF NOT EXISTS
Add IF NOT EXISTS
to the create SQL. The problem is that IF NOT EXISTS
is not supported by every DBMS.
Option D: allow overriding table-exists?
Allow own implementation of ragtime.next-jdbc/table-exists?
. So that consumers of the library can reimplement it in a more performant way. E.g. by executing SELECT 1 FROM ragtime_migrations
in a try/catch.
Option E: reimplement table-exist?
in terms of try/catch
Reimplement ragtime.next-jdbc/table-exists?
by executing an equivalent to SELECT 1 FROM ragtime_migrations
in a try/catch. This is at least a constant time operation vs. going through metadata. The problem is that SELECT 1
or LIMIT 1
aren't a universal SQL syntax.
Notes
The issue was observed using ragtime.next-jdbc
, but it's likely that the same problem exists in ragtime.jdbc
.
My thought would be to extend the DataStore
protocol to include an init-datastore
method. Or perhaps it would be to add a new protocol with that method (though I'm not sure what the protocol would be called), and then check to see if the datastore satisfies that protocol.
Option F: Replace pro-active check with retro-active
Another option is to replace the current check with try-catch and only check for the table existence when it fails:
- (add-migration-id [_ id]
- (ensure-migrations-table-exists datasource migrations-table)
- (sql/insert! datasource migrations-table
- {:id (str id)
- :created_at (format-datetime (Date.))}))
+ (add-migration-id [_ id]
+ (try
+ (sql/insert! datasource migrations-table
+ {:id (str id)
+ :created_at (format-datetime (Date.))})
+ (catch SQLException e
+ (ensure-migrations-table-exists datasource migrations-table)
+ (throw e))))
@weavejester
Or perhaps it would be to add a new protocol with that method (though I'm not sure what the protocol would be called), and then check to see if the datastore satisfies that protocol.
@weavejester You mean something like this? (Ignore naming for the moment)
(defprotocol BeforeUsingMigrations
(before-using-migrations [_ datasource migrations-table]))
(defrecord SqlDatabase [datasource migrations-table]
BeforeUsingMigrations
(before-using-migrations [_ datasource migrations-table]
(ensure-migrations-table-exists datasource migrations-table))
p/DataStore
(add-migration-id [this id]
(before-using-migrations this datasource migrations-table)
(sql/insert! datasource migrations-table
{:id (str id)
:created_at (format-datetime (Date.))}))
(remove-migration-id [this id]
(before-using-migrations this datasource migrations-table)
(sql/delete! datasource migrations-table ["id = ?" id]))
(applied-migration-ids [this]
(before-using-migrations this datasource migrations-table)
(let [sql [(str "SELECT id FROM " migrations-table " ORDER BY created_at")]]
(->> (sql/query datasource sql {:builder-fn rs/as-unqualified-lower-maps})
(map :id)))))
There is also this which decouples what-happens-before from datastore.
(defprotocol BeforeUsingMigrations
(before-using-migrations [_ datasource migrations-table]))
(defrecord EnsureMigrationsTableExists []
BeforeUsingMigrations
(before-using-migrations [_ datasource migrations-table]
(ensure-migrations-table-exists datasource migrations-table)))
(defrecord SqlDatabase [datasource migrations-table before-migrations]
p/DataStore
(add-migration-id [_ id]
(before-using-migrations before-migrations datasource migrations-table)
(sql/insert! datasource migrations-table
{:id (str id)
:created_at (format-datetime (Date.))}))
(remove-migration-id [_ id]
(before-using-migrations before-migrations datasource migrations-table)
(sql/delete! datasource migrations-table ["id = ?" id]))
(applied-migration-ids [_]
(before-using-migrations before-migrations datasource migrations-table)
(let [sql [(str "SELECT id FROM " migrations-table " ORDER BY created_at")]]
(->> (sql/query datasource sql {:builder-fn rs/as-unqualified-lower-maps})
(map :id)))))
(defn sql-database
([datasource]
(sql-database datasource {}))
([datasource options]
(->SqlDatabase datasource
(:migrations-table options "ragtime_migrations")
(:before-migrations options (->EnsureMigrationsTableExists)))))
Close, but not exactly what I had in mind. My initial thought was to add a new core Ragtime protocol:
(ns ragtime.protocols)
(defprotocol Initialize
(initialize [store]))
Add to SqlDatabase
, and then call that from the functions in ragtime.core
as necessary. However, now that I've thought about this problem a little further, I think I favour another approach.
Currently we check the migration table, then make a reasonable assumption that it should still exist a few milliseconds after we check. Someone could drop the table in-between, especially for databases with limited transaction support, but if that happens I feel as though raising an exception would be the correct thing to do.
In this case, we want to lengthen the assumption time, so we check the table less often. One way would be memoize the function for a set period of time, say, 1 second. However, I think it would be more useful to tie the assumption with the lifetime of the data store. So:
(defn- initialize-database [datastore migrations-table initialized?]
(locking initialized?
(when-not @initialized?
(ensure-migrations-table-exists datasource migrations-table)
(vreset! initialized? true))))
(defrecord SqlDatabase [datasource migrations-table initialized?]
p/DataStore
(add-migration-id [_ id]
(initialize-database datastore migrations-table initialized?)
(sql/insert! datasource migrations-table
{:id (str id)
:created_at (format-datetime (Date.))}))
...
Where initialized?
is a volatile:
(defn sql-database
([datasource]
(sql-database datasource {}))
([datasource options]
(->SqlDatabase datasource
(:migrations-table options "ragtime_migrations")
(volatile! false))))
Does this seem reasonable?
I understand the solution. Unfortunately I don't think that would solve the performance problem. At least not completely. :smile:
The performance problem manifests when there are many, many tables. That usually happens in a multi-tenancy solution whereby tenants have their own dedicated DB schema. For each schema there is a ragtime_migrations
table.
This also means that there are as many data stores as there are tenants. Not necessarily because there are so many javax.sql.DataSource
es, but because a schema prefix is the simplest way to configure the data store:
(ragtime.next-jdbc/sql-database datasource {:migrations-table "customer_42.ragtime_migrations"})
Coming back to the performance problem. When the service starts up, it's natural to try to apply the needed migrations, if any. Even when nothing changed at all, ragtime still needs to check by executing applied-migration-ids
, which would try to initialize the database. That takes time. E.g. if it takes 600ms (we clocked it at 650+ms, and it's only going to get worse) to check via metadata (current implementation), executing for 1000 tenants sequentially would take 10 minutes to apply no changes to the DB.
If ragtime would have some migrations to apply then that's where the initialized?
solution would show some performance benefit. But the NOOP migration run taking 10+ minutes is what sparked the issue in the first place. :smile:
For the sake of completeness and for the benefit of others if they happen to find themselves in the same situation, here is the (hopefully temporary :smile:) solution that fixes the performance in Postgres:
(defn- ensure-ragtime-migrations-table-exists [datasource migrations-table]
;; Replacement for `ragtime.next-jdbc/ensure-migrations-table-exists` that is
;; much more performant in Postgres.
(let [sql (format "CREATE TABLE IF NOT EXISTS %s (id VARCHAR(255), created_at VARCHAR(32));"
migrations-table)]
(jdbc/execute! datasource [sql])))
(with-redefs [ragtime.next-jdbc/ensure-migrations-table-exists ensure-ragtime-migrations-table-exists]
;; now you can use normal ragtime fns
(ragtime/migrate-all ...)
(ragtime/migrate ...)
(ragtime/rollback ...)
;; etc.
)
Ah, I believe I understand now. It's been a while since I touched Ragtime's code, and I misinterpreted your initial explanation of the problem. I'd completely forgotten that in order to support all databases, Ragtime grabs a list of all tables from the database's metadata, then searches through them itself.
In which case, I believe my preference would be to go for option B. Something like: :migrations-table-exists-sql
.
Though we could also have a :ensure-migrations-table-exists?
option as well (i.e. option A). Either or both would be fine.
Thanks for the input. I'll create the PR for this. Hopefully in the next 2 weeks. :smile: