unison
unison copied to clipboard
Proposal: prepared statement cache
- [x] blocked on #2718
Motivation
In looking over some runtime profiles of ucm
starting up, I notice we spend a lot of time opening and finalizing sqlite statements. Because we perform so many queries, but have relatively few unique queries, I think we would benefit quite a lot from a prepared statement cache.
Prepared statement background
The lifetime of our sqlite queries currently is thus:
- Open statement.
- Bind parameters.
- Read rows one at a time.
- Finalize statement.
However, we could instead reuse an open statement like so:
- Open statement.
- Bind parameters.
- Read rows one at a time.
- Reset statement.
- Clear bindings.
- From here, can goto (2) or goto (7)
- Finalize statement.
This lets us to reuse a statement an arbitrary number of times before finalizing it, which (among other things) lets sqlite avoid parsing the query over and over. The compiled representation of the query is kept cached in the connection object.
API design
Query
variant
We need a Query
variant (in sqlite-simple
, just a Text
newtype), which can additionally be cheaply identified and disambiguated from others. An Int
will suffice for this. I'll call this type Sql
.
-- same as `Database.SQLite.Simple.Query`, but paired with a unique `Int`
data Sql
= Sql Int Text
-- example usage
myQuery :: Sql
myQuery =
Sql 1 "select foo from bar where baz = ?"
Since that 1
must never be reused, we can just hide the constructor of Sql
and expose a quasi-quoter that generates fresh ints each time it's used.
Below would be the actual interface.
data Sql
sql :: QuasiQuoter
-- example usage
myQuery :: Sql
myQuery =
[sql| select foo from bar where baz = ? |]
Prepared statement cache
Next, we need to pass a prepared statement cache into every place where we submit a prepared statement to sqlite.
If we've prepared a statement before, we clear its bindings and reuse it. Otherwise, we open a new statement, and put it into the cache after use.
data PreparedStatementCache
withPreparedStatementCache :: (PreparedStatementCache -> IO r) -> IO r
-- only opens on first `with`, otherwise returns from cache
withStatement :: PreparedStatementCache -> Sql -> (Sqlite.Statement -> IO r) -> IO r
-- for completeness; I suspect `unison` may need not ever evict from this small cache
finalizeStatement :: PreparedStatementCache -> Sql -> IO ()
~12:01am thought: this API isn't thread-safe. Needs tinkering.~ Edited it to be thread-safe.
query
variants
Finally, we need a variant of query
(and execute
, etc) which additionally take this PreparedStatementCache
.
-- from sqlite-simple, for comparison
query
:: (ToRow params, FromRow result)
=> Connection
-> Query
-> params
-> IO [result]
pquery
:: (ToRow params, FromRow result)
=> Connection
-> PreparedStatementCache
-> Sql
-> params
-> IO [result]
Note: the Connection
and PreparedStatementCache
always go together, since the connection itself is "where" the statements are prepared, so we would probably want to tuck the PreparedStatementCache
into a connection wrapper thing to prevent them from floating apart.
Implementation plan
newtype PreparedStatementCache =
PreparedStatementCache (IORef (IntMap (MVar Sqlite.Statement)))
openPreparedStatementCache :: IO PreparedStatementCache
closePreparedStatementCache :: PreparedStatementCache -> IO ()
withPreparedStatementCache :: (PreparedStatementCache -> IO r) -> IO r
withPreparedStatementCache = bracket openPreparedStatementCache closePreparedStatementCache
-- Assume these come from a sqlite library
openStatement :: Text -> IO Sqlite.Statement
closeStatement :: Sqlite.Statement -> IO ()
withStatement :: PreparedStatementCache -> Sql -> (Sqlite.Statement -> IO r) -> IO r
withStatement (PreparedStatementCache cacheRef) (Sql queryId query) callback = do
cache <- readIORef cacheRef
statementVar <-
case IntMap.lookup queryId cache of
Nothing -> do
-- As written, this is only safe if we do not expose a
-- `finalizeStatement` action that can be called at any time
mask \restore -> do
statement <- restore (openStatement query)
statementVar <- newMVar statement
atomicModifyIORef' cacheRef (IntMap.insert i statementVar)
pure statementVar
Just statementVar -> pure statementVar
withStatementVar statementVar callback
I think this seems super reasonable, however I'm realizing that archived namespaces #1340 may make this optimization less important. Like, yes it is good to make loading things from the codebase faster, but another thing we can do is avoid having to load things at all.
#1340 would be a win for performance (it will generally improve performance of anything that depends on the size of the namespace), but also usability, since it is confusing / a footgun having so many names be simultaneously active.
@pchiusano Interesting.I think they do cover different enough use cases for both of them to potentially be important. Archived namespaces wouldn't help load an Ed Kmett root with dozens of actively maintained packages, for example.
But even just loading .base/
today takes 3-4 seconds on my machine, and I'd be curious to know how much of that is preparing/finalizing statement overhead. It's a little hard to tell from benchmarks, but it does look like quite a bit.
I love this; I'm not sure how I missed it until now.
@mitchellwrosen Not sure if you already saw, but we have an existing Connection
wrapper (for debugging purposes) in U.Codebase.Sqlite.Connection
that's already used everywhere, so the cache could be just another field there.