sqlite_orm
sqlite_orm copied to clipboard
[Feature brainstorm] Unlock-notify or retry logic
I need to write to the same database from several threads. Sometimes, the threads encounter SQLITE_LOCKED and a std::system_error is thrown, but this is a recoverable issue and I would like to simply retry the step, perhaps with a backoff.
As per the sqlite3_unlock_notify page, perhaps it is possible to work the retry logic into this.
Currently, most of the code involving sqlite3_step() looks like this:
if(sqlite3_step(stmt) == SQLITE_DONE) {
// done..
} else {
throw std::system_error(std::error_code(sqlite3_errcode(db), get_sqlite_error_category()), sqlite3_errmsg(db));
}
Perhaps it can be changed into something like this.
We set a bool flag in storage_base to indicate if we should block on SQLITE_LOCKED.
int rc = 0;
if (blocking) {
while ((rc = sqlite3_step(stmt)) == SQLITE_LOCKED) {
rc = wait_for_unlock_notify(sqlite3_db_handle(stmt));
}
} else {
rc = sqlite3_step(stmt);
}
if (rc != SQLITE_DONE) {
throw std::system_error(std::error_code(sqlite3_errcode(db), get_sqlite_error_category()), sqlite3_errmsg(db));
}
Where wait_for_unlock_notify() is similar to the one described in the page above.
However, not all sqlite3 compilations have the sqlite3_unlock_notify() function. In that case, we can somehow fall back to a regular blocking loop:
#include <chrono>
using namespace std::chrono_literals;
int rc = 0;
if (blocking) {
while ((rc = sqlite3_step(stmt)) == SQLITE_LOCKED) {
std::this_thread::sleep_for(1s);
}
} else {
rc = sqlite3_step(stmt);
}
if (rc != SQLITE_DONE) {
throw std::system_error(std::error_code(sqlite3_errcode(db), get_sqlite_error_category()), sqlite3_errmsg(db));
}
Of course, sleep_for can be hidden inside wait_for_unlock_notify (with a better name I guess) and the existence of sqlite3_unlock_notify is detected with SFINAE.
What do you think?
As for the implementation of unlock_notify object, it can be achieved through std::condition_variable and std::mutex to keep it cross-platform, instead of relying on pthread.
Hello. This is a good case. Actually the only sqlite3 call in this code is sqlite3_unlock_notify. What if we add sqlite3_unlock_notify API to storage and other will be added on your side? Why so? Cause different people may use different ways and different code in callback. It will look like this:
struct UnlockNotification {
int fired = 0; /* True after unlock event has occurred */
pthread_cond_t cond; /* Condition variable to wait on */
pthread_mutex_t mutex; /* Mutex to protect structure */
};
UnlockNotification unlockNotification;
storage.unlock_notify([](UnlockNotification &unlockNotification){
pthread_mutex_lock(&unlockNotification.mutex);
unlockNotification.fired = 1;
pthread_cond_signal(&unlockNotification.cond);
pthread_mutex_unlock(&unlockNotification.mutex);
}, unlockNotification);
And also unlock_notify function will be guarded with SQLITE_ENABLE_UNLOCK_NOTIFY macro to make it available where it is available and reverse.
What do you think?
The minor problem here is that sqlite3_unlock_notify() can be applied on a per-step basis, i.e. different statements and steps can have different callbacks.
When you write it as storage.unlock_notify( ... ) it looks like a connection-wide setting.
It would be nice if we can have a tag that goes into the statement execution, for example
storage.select(columns(&MarvelHero::name, length(&MarvelHero::name)), /* callback object here */);
or
storage.select<CallbackObj>( ... );
so that it is overloaded, and doesn't break backward compatibility.
Actually, on second thoughts, I don't have a problem with a connection-wide unlock-notify setter, although I imagine that eventually there can be a per-statement unlock-notify handler.
It has to be clearly documented to avoid confusion, though.
After thinking about it for a while longer, my opinion is that all implementations of unlock-waiting (i.e. the one provided in the sqlite3 webpage) will more or less look the same. We can call that the "reference" implementation. The only places to innovate are the predicate conditions of the conditional_variable:
int wait(sqlite3 *db) noexcept {
struct unlock_notify_t {
std::atomic_bool fired = false;
std::conditional_variable cv;
std::mutex mutex;
} un;
int rc = sqlite3_unlock_notify(db,
[](void **apArg, int nArg) {
for (int i = 0; i < nArg; ++i) {
/* not much room for changes here, can be the default implementation */
auto *un = reinterpret_cast<unlock_notify_t*>(nArg[i]);
std::lock_guard<std::mutex> lock(un->mutex);
un->fired.store(true);
un->cv.notify_one();
}
},
&un);
if (rc == SQLITE_OK) {
std::unique_lock<std::mutex> lock(un.mutex);
un.cv.wait(lock, [&un] () -> bool {
/* user defined things (maybe set a timeout) */
/* but it must eventually depend on the "fired" status of unlock_notify_t */
return un.fired.load();
});
}
return rc;
}
I suppose we can take a bool-returning function to allow user-defined options.
what if this kind of callback can be set right in storage just like storage.on_open already exists? So you write:
storage.unlock_notify_callback = [] {
// do whatever you want
};
and when storage knows that unlock_notify_callback is not null then it will run in all its' queries a blocking scenario from your first comment.
@undisputed-seraphim one more question. Can you solve your issue with sqlite3_busy_handler?
@undisputed-seraphim are you there?
Sorry, I had a few things to do over the last few days.
Busy handler does not solve my problem (it solves another problem though 👍 )
I still need a retry function and it definitely should not throw an exception on a recoverable issue. Currently I see that it throws an exception as soon as ret != SQLITE_OK is encountered, which is... less than desirable.
I actually quite like the latest idea you proposed. Let's go with that. Additionally, some default retry backoff strategies will be nice also :)
(And a new release tag soon!... I hope!)
btw you can remove busy_handler from the TODO list now :D
Ok I see that my idea is not enough. It will be easier to implement it of I know how to reproduce database to return SQLITE_BUSY. Do you know how to do it for sure?
SQLITE_BUSY is a conflict between 2 different DB connections, whereas SQLITE_LOCKED is a conflict within the same DB connection. For my use case, both happens uncommonly, but when it does happen it simply needs to try until it succeeds.
For SQLITE_BUSY it is hard, because the unit test has to create two connections to the same DB on two different processes. Not really familiar with Catch2 to know how to write it.
For a simple non-automatic test, I suppose we can just write a program that sleeps during a sqlite3_step so it retains a lock on the table, then run another process that also tries to write to it while the lock is held.
Off topic I was looking through the code when I noticed that some places can be improved or refactored. Let me know if you're interested in these changes. https://github.com/fnc12/sqlite_orm/compare/dev...undisputed-seraphim:simplify
SQLITE_BUSY is a conflict between 2 different DB connections, whereas SQLITE_LOCKED is a conflict within the same DB connection. For my use case, both happens uncommonly, but when it does happen it simply needs to try until it succeeds.
Can we simulate it using two threads with two storages?
Off topic I was looking through the code when I noticed that some places can be improved or refactored. Let me know if you're interested in these changes. dev...undisputed-seraphim:simplify
That looks great. I want to refactor it that way. One problem exists there: you lost static_asserts in some prepare calls. static_asserts are very important - it is one of main features. If you like make a PR but I shall review with changes request) Or it is better to make small PRs with a single function per PR. E.g. perform_step, then perform_simple_query etc. It will be quicker to merge
(And a new release tag soon!... I hope!)
Yes it will be very soon (in a week). For updates please follow twitter account of this lib https://twitter.com/sqlite_orm
Sorry for being missing again. I'll be making a series of pull requests for the simplify branch changes now.
For the past few days I have unsuccessfully tried to simulate a locked database in a unit test. Still trying to figure out how to do it in a non-probabilistic way.
Some updates from me:
I have worked around this issue in my code, so I no longer run into this issue. I no longer need this particular feature. So, please close it if you prefer.
How did you do it? I am curious.
My problem was that I was trying to write to the same table from different threads. All I did was to avoid that be re-organizing the code.
I also adjusted my sqlite3 commands so that the query planner does not detect a deadlock, hence the busy timeout is respected (reference) (Although in practice it will not cause a deadlock, it just takes a bit more time to complete. Query planner is not smart enough to detect this)
@undisputed-seraphim is this issue actual?
@undisputed-seraphim is this issue actual?