SQLite.swift
SQLite.swift copied to clipboard
Allow concurrent reads and writes (connection pool)
Hi: My question is:
Can I create two connection with one is read-only and the other is read-wirte?I just try to do that.but there throw a error because the database is locked.
what I need to do like this is, my app will write data frequency and I don't want to block the read thread when I need update the UI.
can you guys give me an example, or solution about this?
thanks!
Hello @bytegh. SQLite.swift manages SQLite concurrency for you, assuming you are OK with a single connection to the database, and serialized accesses.
If you want several connections, and perform concurrent read and writes, then this library won't prevent you from doing it, but won't help either: you'll have to open your SQLite database in the WAL mode and manage your own pool of connections.
This is hard. You can also use a Swift SQLite library that has already done this work for you, such as GRDB.swift. See its DatabasePool class, because it may just suit your needs already: https://github.com/groue/GRDB.swift#database-pools.
(To maintainers: I indulge myself linking to GRDB, because I don't think SQLite.swift has any ambition at providing database scheduling that goes beyond the simple serialization of a single connection. Yet this natural feature request can have a better answer than a plain "DIY").
@groue SQLite.swift has a PR that's been open for awhile that addresses this: https://github.com/stephencelis/SQLite.swift/pull/289
It needs a shepherd to help bring it to completion, though.
@stephencelis. And a pretty good one :-) I hope @kdubb will eventually fix the "few problems" he has found.
@groue @stephencelis thanks you guys. :)
A connective pool is relatively simple to implement, assuming you put most of the serialisation responsibilities onto the programmer ( don't do re-entrant transactions etc… ):
Here is one we're using internally:
import Foundation
import SQLite
internal final class DatastoreConnectionPool {
let url: NSURL
let maximumReadersCount: Int
private let writer: DatastoreConnection
private var readers: [DatastoreConnection]
private let readersCondition = NSCondition()
private let readersLock = NSRecursiveLock()
internal init(url: NSURL, maximumReadersCount: Int = 3) throws {
self.url = url
self.maximumReadersCount = maximumReadersCount
self.writer = try DatastoreConnection(url.path!, readonly: false)
var readers: [DatastoreConnection] = []
for _ in 0..<maximumReadersCount {
let reader = try DatastoreConnection(url.path!, readonly: true)
readers.append(reader)
}
self.readers = readers
// put into WAL mode
try! self.writer.execute("PRAGMA journal_mode = WAL")
}
internal func readWriteTransaction<T>(kind: DatastoreConnectionTransactionKind, handler: (DatastoreConnection) throws -> T) throws -> T {
var result: T!
do {
try self.writer.transaction(kind) {
result = try handler(self.writer)
}
} catch {
throw DatastoreError(reason: "Read/write transaction failed", underlying: error)
}
return result
}
internal func readTransaction<T>(handler: (DatastoreConnection) throws -> T) throws -> T {
let reader = self.checkoutReader()
defer { self.checkinReader(reader) }
var result: T!
do {
try reader.transaction {
result = try handler(reader)
}
} catch {
throw DatastoreError(reason: "Read transaction failed", underlying: error)
}
return result
}
private func checkoutReader() -> DatastoreConnection {
var reader: DatastoreConnection? = nil
while reader == nil {
self.readersLock.lock()
reader = self.readers.popLast()
self.readersLock.unlock()
guard reader == nil else {
break
}
self.readersCondition.lock()
self.readersCondition.wait()
self.readersCondition.unlock()
}
return reader!
}
private func checkinReader(reader: DatastoreConnection) {
self.readersLock.lock()
self.readersCondition.lock()
self.readers.insert(reader, atIndex: 0)
self.readersCondition.signal()
self.readersCondition.unlock()
self.readersLock.unlock()
}
}
A connective pool is relatively simple to implement, assuming you put most of the serialisation responsibilities onto the programmer ( don't do re-entrant transactions etc… ):
@robertjpayne: This is exactly where raw pools don't quite shine: easy to implement, but difficult to use properly. Even by advanced users that know about eventual concurrency issues. ccgus/FMDB didn't go far enough in its FMDatabasePool implementation, leaving too much to the developer, and eventually had to advise against it because it was too dangerous (and it didn't even implement WAL support). More: pools increase the concurrency level, and the chances of inconsistencies that may happen when one thread modifies the database between two statements executed by another: https://medium.com/@gwendal.roue/four-different-ways-to-handle-sqlite-concurrency-db3bcc74d00e.
@groue I'd argue a lot of developers abstract away the raw SQL logic anyways, unless you have extreme needs ( we actually do in our application and still abstract all of it into an Request enum ) and thus it's very simple to obey concurrency rules. Furthermore this pool follows the FMDB model where it forces you to run all your logic inside a single read or read/write block so unless you're really aiming to shoot yourself in the foot by storing a reference to the connection or performing async code with it you should be ok.
This very simple pool implementation also is sort of exactly what you're talking about? It ensures 1 writer always, and multiple readers?
This very simple pool implementation also is sort of exactly what you're talking about? It ensures 1 writer always, and multiple readers?
Yes it is.
it's very simple to obey concurrency rules.
As soon as you let concurrent writes sneak in between subsequent statements, it's much less easy. How to prevent most users to assume that those values are the same unless they provide their own isolation code?
let count1 = try db.scalar(users.count)
let count2 = try db.scalar(users.count)
@groue That sort of thing is going to happen regardless, a developer working with any SQL database should assume reads not inside a transaction can differ between multiple executions.
But wrapping them in a transaction is pretty moot point, at that point they are likely having to be run inside a block and it's rare in a single synchronous read only transaction you'd read the same query twice…
That sort of thing is going to happen regardless, a developer working with any SQL database should assume reads not inside a transaction can differ between multiple executions.
I beg to disagree :-) Two reasons: one does not need a transaction when one only needs isolation (transaction implies isolation, but isolation does not require a transaction - since for example serialization provides isolation as well). And two: a sane API can provide isolation by default, without requiring the developer to think about it (all credits due to ccgus/FMDB, even if GRDB applies the same principles), and this is wrapping database accesses in isolated blocks by default (inside serialized blocks, or deferred transactions for pool readers).
But wrapping them in a transaction is pretty moot point, at that point they are likely having to be run inside a block and it's rare in a single synchronous read only transaction you'd read the same query twice…
The failure scenario is when the easiest way to use the library does not provide isolation, and when the threading issues of running two subsequent statements are overlooked by the developper (overlooking can be the consequence of a lack of skills, or simple oversight from an experienced developer).
But I'm sorry. My points are that "That sort of thing is going to happen regardless" (talking about isolation issues) is just false. And "assuming you put most of the serialization responsibilities onto the programmer" is OK, of course, but one can do better. GRDB is much safer on that topic than SQLite.swift. I'm done polluting this thread sorry.
@groue I wouldn't say it's polluting this thread, it's a good conversation to have when it comes to implementing a write + read pool.
The only way to ensure two reads always return the same result is with a SQLite read transaction, you can isolate the DB as much as you like in code but you cannot guarantee another process/connection isn't reading and writing to the same database file.
I've also updated my connection pool code to ensure reads are run in a deferred transaction, beyond that while it's a simple implementation the only thing that could help resolve programmer errors is to fire proper warnings if they try to nest transactions.
The only way to ensure two reads always return the same result is with a SQLite read transaction, you can isolate the DB as much as you like in code but you cannot guarantee another process/connection isn't reading and writing to the same database file.
I agree. We have implicitly talking about app-centric SQLite libraries here, the ones that can say "those concurrency guarantees hold as long as there is no other connection to the database".
I've also updated my connection pool code to ensure reads are run in a deferred transaction
Your future self will thank you :-)
@kdubb has reopened the following PR: https://github.com/stephencelis/SQLite.swift/pull/411 I encourage anyone that wants to use a connection pool to check it out.
@stephencelis We were discussing "naive" pools, the ones that require care (and skills) from their users. As far as I can see, @kdubb's is a naive one.
I disagree with a lot of what has been said here...
Most of your talk seems to center around an existential SQL api and does not take into account the fact this library (or FMDB or GRDB) are targeting a single API, SQLite, and for all intents and purposes it has considered concurrency in almost every case. It's concurrency support/API are not well understood but regardless one should always be using concurrency that the DB supports rather than not.
@robertjpayne, @groue The example shown in this thread is "naive" in that it attempts to provide concurrency on top of the SQLite library. This works except, again, the database knows nothing about it. Depending what flags you've passed to SQLite during it's compilation and/or when opening a database connection this code will either increase concurrency slightly or not at all. The reason being SQLite can be built to be default "multi-threaded" (forgive me if I get their terms backward because SQLites terminology is awkward). This means that all connections in the process are locked from each other allowing them to be used safely from any thread but effectively serializes access to the DB, regardless of connection. To ensure a pool implementation as shown in this thread actually increases concurrency you'd have to pass SQLITE_OPEN_NOMUTEX unless you can be sure the library was built with the equivalent flag and this is the default mode.
@groue FMDB's FMDatabasePool failed not because it left too much to the developer but because it was, and still is, the wrong API for SQLite. The only method SQLite provides to increase concurrency is WAL mode. That provides a single reader and multiple writers. This means the developer must, on some level, understand this. FMDatabasePool failed because, as far as I can tell, ccgus didn't previously understand SQLite's API (and trust me that was after many long winded discussions with on a few topics related to FMDB). FMDatabasePool attempts to allow reads and writes "from a pool" which will never work with SQLite because it explicitly doesn't support that; you'll get exactly what ccgus wrestled with, "Database Busy" errors. To overcome this the only "sane" thing to do is force the programmer to distinguish between read and write connections. Such is the way GRDB does it, without issues and interestingly enough it's the way our internal FMDatabaseReadWritePool has done it for years. ccgus has been extremely hostile to changing this class because of the "issues" he couldn't solve so we stopped offering changes to FMDB that increased concurrency; although we have a public FMDB fork I can publish our pool to. Although this SQLite.swift implementation is fairly similar to it.
@robertjpayne On the topic of read transactions with SQLite, and almost any other DB, you need no such thing if you are doing a single read transaction as it's read from a single implicit point in time. If you are attempting multiple separate reads that return a stable set of information you need to worry about a bit more than "starting a transaction". SQLite's default mode is to "read committed" in transactions (again most DBs have this default). This means you are doing nothing by starting a transaction to only read data unless you change your serialization mode to "serialized"; honestly I'm not even sure SQLite even properly supports this mode as I've never investigated; almost nobody needs it.
So, if by "naive" implementation you mean one that has been attempted to be built using the concurrency of the library it wraps, comes from a couple years experience using that library from similar languages and building wrappers in those other languages then yes... our implementation is "naive" 😄
I do want to be clear I'm suggesting the principles our implementation of a pool was built on are the proper ones. I am in no way suggesting it's a bug free implementation or that it's API is currently "the best". Our internal FMDB has been used for many years and resembles closely the GRDB API. SQLite.swift has a bit different API and I tried to implement it in a new way that was more familiar to this library; that may have introduced problems I'm not aware simply because it hasn't been extensively either internally or externally.
@kdubb SQLite by default wraps all reads in a implicit transaction so as far as I know from experience there is no penalty by running deferred transactions for single reads and I have seen huge performance gains by running multiple reads in a single transaction.
I believe SQLite's default isolation mode is actually SERIALIZABLE and you have to manually set it into read uncommitted if you want potentially varied reads in a single transaction
Generally speaking SQLite concurrency does make a huge difference, but SQLite.swift links to Apple compiled SQLite and it also opens the database connections so it should assume it's multi thread safe.
Looking over your implementation ( I didn't see it before ) I don't think it's naive, except maybe it could create way to many readers!
The only other thing about the pool in #411 is I really wouldn't allow nested transactions, I generally shy away from them and it's probably worth the pain of manual savepoints to avoid the insane headache re-entrant transactional SQL can be.
@robertjpayne You are correct, SQLite is SERIALIZABLE in most cases! I have too many DB APIs in my head... apologies for the misinformation.
Again, #411 probably has a lot of implementation issues/bugs. Nested transactions/executions are a god send when implementing certain things (like graph style object saving) as they reduce the requirement of carrying a connection throughout your API.
Although, your point does pinpoint a specific deficiency with this type of API over the GRDB style API. In our FMDatabaseReaderWriterPool we detect nested transactions and automatically convert them to savepoints. We can do this because the transactions are blocks/closures. In the #411 implementation you are free to use the borrowed connection as you wish which means we cannot easily detect and handle this situation. Might be an argument for going towards the blocks/closures style.
Just to clarify: by "naive" I mean "keen to let developers shoot themselves in the foot by giving a false sense of "thread-safety" when actually subsequent statements are not isolated by default":
// You expect those to be equal, you fool:
let count1 = try db.scalar(users.count)
let count2 = try db.scalar(users.count)
Isolation for single statements, and making sure they perform well regardless of concurrent updates is one thing. That is not too hard to achieve, even if we have to take care of flags, yes. But real-life developers sometimes want to execute several statements in a row. When those statements are not isolated, there is an opportunity for other threads to sneak in and make a mess. This is the danger I'm trying to open your eyes on. The one that was initially solved by FMDB, and inherited by GRDB. SQLite.swift, with its direct access to the database, is exposed to those isolation troubles. #411 as well.
maybe it could create way to many readers
Limiting the number of concurrent readers is an expected service, for sure.
#411 doesn't limit total readers. This was supported by the delegate that was originally in the PR. @stephencelis suggested some better ideas than the delegate and I agreed. I overlooked the fact I was essentially removing the connection limiting from the code. That needs to be fixed.
Also, to be clear, I was being snarky with my quoted use of naive. I wasn't offended or suggesting you were incorrect in using that term!
I understand... SQLite concurrency is very difficult, and we put much energy in it. Discovering an unexpected issue is always a tough moment :-)
@kdubb I don't disagree nested transactions are super handy, but it's far better taken care of with manual save points, SQLite's nested transaction support is so weird I find it a total false sense of security to use nested transactions and proliferating them through code gives a sense that your in a transaction when really you're inside a save point which may be inside another save point which may then be inside a transaction.
I'd also agree with @groue that giving out unfettered DB connections probably isn't the best, it'd be far better to wrap them in a transaction and ensure that for the duration of that "checkout" the view of data across the connection is consistent.
Re:
Might be an argument for going towards the blocks/closures style.
And:
it'd be far better to wrap them in a transaction and ensure that for the duration of that "checkout" the view of data across the connection is consistent.
I'm open to these directions in general, especially for the database pool. GRDB and Realm offer nice constraints in these departments.
The "hand me a connection and I'll do with it as I want" model is attractive in a few scenarios. Although, again, our FMDatabaseReaderWriterPool uses the blocks model; we have rarely, if ever, found an issue with that model in practice.
@kdubb block based model also ensures connections are immediately checked in after the block exits so you can't have any situation where a connection is held onto forever ( though nothing stops someone from holding a reference and continuing to use it ). Luckily most of those times they would be read-only connections and worst is you deadlock.
Another interesting point to note in this thread is that "unix-excl" VFS mode is great for concurrency but will fail miserably if you try to use the DB file from multiple processes. We previously used that mode on all iOS derived variants. Unfortunately, with iOS 9, we ran into a fun issue when we implemented share sheets that are out of process extensions that were writing to our database.
#411 should manage that expectation as well. Currently it assumes exclusive access.
i recieve this error is this normal and how to fix it?