unison icon indicating copy to clipboard operation
unison copied to clipboard

Proposal: prepared statement cache

Open mitchellwrosen opened this issue 3 years ago • 3 comments

  • [x] blocked on #2718

Motivation

In looking over some runtime profiles of ucm starting up, I notice we spend a lot of time opening and finalizing sqlite statements. Because we perform so many queries, but have relatively few unique queries, I think we would benefit quite a lot from a prepared statement cache.

Prepared statement background

The lifetime of our sqlite queries currently is thus:

  1. Open statement.
  2. Bind parameters.
  3. Read rows one at a time.
  4. Finalize statement.

However, we could instead reuse an open statement like so:

  1. Open statement.
  2. Bind parameters.
  3. Read rows one at a time.
  4. Reset statement.
  5. Clear bindings.
  6. From here, can goto (2) or goto (7)
  7. Finalize statement.

This lets us to reuse a statement an arbitrary number of times before finalizing it, which (among other things) lets sqlite avoid parsing the query over and over. The compiled representation of the query is kept cached in the connection object.

API design

Query variant

We need a Query variant (in sqlite-simple, just a Text newtype), which can additionally be cheaply identified and disambiguated from others. An Int will suffice for this. I'll call this type Sql.

-- same as `Database.SQLite.Simple.Query`, but paired with a unique `Int`
data Sql
  = Sql Int Text

-- example usage
myQuery :: Sql
myQuery =
  Sql 1 "select foo from bar where baz = ?"

Since that 1 must never be reused, we can just hide the constructor of Sql and expose a quasi-quoter that generates fresh ints each time it's used.

Below would be the actual interface.

data Sql
sql :: QuasiQuoter

-- example usage
myQuery :: Sql
myQuery =
  [sql| select foo from bar where baz = ? |]

Prepared statement cache

Next, we need to pass a prepared statement cache into every place where we submit a prepared statement to sqlite.

If we've prepared a statement before, we clear its bindings and reuse it. Otherwise, we open a new statement, and put it into the cache after use.

data PreparedStatementCache

withPreparedStatementCache :: (PreparedStatementCache -> IO r) -> IO r

-- only opens on first `with`, otherwise returns from cache
withStatement :: PreparedStatementCache -> Sql -> (Sqlite.Statement -> IO r) -> IO r

-- for completeness; I suspect `unison` may need not ever evict from this small cache
finalizeStatement :: PreparedStatementCache -> Sql -> IO ()

~12:01am thought: this API isn't thread-safe. Needs tinkering.~ Edited it to be thread-safe.

query variants

Finally, we need a variant of query (and execute, etc) which additionally take this PreparedStatementCache.

-- from sqlite-simple, for comparison
query 
  :: (ToRow params, FromRow result) 
  => Connection 
  -> Query 
  -> params 
  -> IO [result]

pquery 
  :: (ToRow params, FromRow result) 
  => Connection 
  -> PreparedStatementCache 
  -> Sql 
  -> params 
  -> IO [result]

Note: the Connection and PreparedStatementCache always go together, since the connection itself is "where" the statements are prepared, so we would probably want to tuck the PreparedStatementCache into a connection wrapper thing to prevent them from floating apart.

Implementation plan

newtype PreparedStatementCache = 
  PreparedStatementCache (IORef (IntMap (MVar Sqlite.Statement)))

openPreparedStatementCache :: IO PreparedStatementCache
closePreparedStatementCache :: PreparedStatementCache -> IO ()

withPreparedStatementCache :: (PreparedStatementCache -> IO r) -> IO r
withPreparedStatementCache = bracket openPreparedStatementCache closePreparedStatementCache

-- Assume these come from a sqlite library
openStatement :: Text -> IO Sqlite.Statement
closeStatement :: Sqlite.Statement -> IO ()

withStatement :: PreparedStatementCache -> Sql -> (Sqlite.Statement -> IO r) -> IO r
withStatement (PreparedStatementCache cacheRef) (Sql queryId query) callback = do
  cache <- readIORef cacheRef
  statementVar <-
    case IntMap.lookup queryId cache of
      Nothing -> do
        -- As written, this is only safe if we do not expose a 
        -- `finalizeStatement` action that can be called at any time
        mask \restore -> do
          statement <- restore (openStatement query)
          statementVar <- newMVar statement
          atomicModifyIORef' cacheRef (IntMap.insert i statementVar)
          pure statementVar
    Just statementVar -> pure statementVar
  withStatementVar statementVar callback

mitchellwrosen avatar Oct 21 '21 03:10 mitchellwrosen

I think this seems super reasonable, however I'm realizing that archived namespaces #1340 may make this optimization less important. Like, yes it is good to make loading things from the codebase faster, but another thing we can do is avoid having to load things at all.

#1340 would be a win for performance (it will generally improve performance of anything that depends on the size of the namespace), but also usability, since it is confusing / a footgun having so many names be simultaneously active.

pchiusano avatar Oct 21 '21 04:10 pchiusano

@pchiusano Interesting.I think they do cover different enough use cases for both of them to potentially be important. Archived namespaces wouldn't help load an Ed Kmett root with dozens of actively maintained packages, for example.

But even just loading .base/ today takes 3-4 seconds on my machine, and I'd be curious to know how much of that is preparing/finalizing statement overhead. It's a little hard to tell from benchmarks, but it does look like quite a bit.

mitchellwrosen avatar Oct 24 '21 13:10 mitchellwrosen

I love this; I'm not sure how I missed it until now.

@mitchellwrosen Not sure if you already saw, but we have an existing Connection wrapper (for debugging purposes) in U.Codebase.Sqlite.Connection that's already used everywhere, so the cache could be just another field there.

aryairani avatar Oct 27 '21 02:10 aryairani