Questions about scalability, and doing backups
Hi There,
I had a quick look in the documentation, and I could not find any reference about how the DB scales.
I would like to know how I should expect it to behave performance wise as the number of records increases.
Consider the query:
SELECT * FROM invoices WHERE customerid = $1;
With an index:
CREATE INDEX invoicesLookup ON invoices(customerid);
What kind of performance can I 'expect' average case and worst case? (no guarantees ofc). O(n)? O(log(n))? O(1)?
Likewise for a range query with dates:
SELECT * FROM invoices WHERE creationdate < now();
With an index:
CREATE INDEX invoicesDateLookup ON invoices(creationdate );
What kind of performance?
Moreover, what kind of performance can I expect for ORDER BY? O(n^2)?
My tinkering and blooping around suggests that ql is relatively stable, as I haven't been able to cause corruption no matter what I do. However, I haven't been able to work out a good way of 'dumping'/backing-up the database. Other database engines such as postgres have mechanisms designed to dump a database to a file such that it can be reconstructed. But I havent been able to work out how to dump ql, without closing the db and re-opening it.
So my second question is: how should one do backups of a live database in ql?
db.flush() added, so I can use that.
Reopening b/c the intent is to answer once the new DB backend is finished. No promises on timeline, but it is an active WIP.
Any update on this? I've found myself wanting to use ql again :)
I cannot recall what the situation was in 2014, but ATM I think
-
SELECT * FROM invoices WHERE customerid = $1;: O(n). -
With
CREATE INDEX invoicesLookup ON invoices(customerid);: O(log(n)+m) where m is the cardinality of the result set. -
SELECT * FROM invoices WHERE creationdate < now();: O(n). -
With
CREATE INDEX invoicesDateLookup ON invoices(creationdate );: O(log(n)+m).
I suggest to verify the above by writing benchmarks* - it may also be the case that the logarithmic behavior works only for some types.
Wrt dumps/backups: No progress. I've never finished the new backend, but PRs are always welcome :wink:
Maybe @gernest has some ideas?
(*) They would also make for a nice contribution probably.
@cznic is the new backend you are referring to related to the HTTPFS?
@twitchyliquid64 regarding backup, since ql uses a single file don't you think you can just copy&gzip it? Or do you want something like pg_dump dumping SQL to recreate the database state?
is the new backend you are referring to related to the HTTPFS?
No, the new backend should have replaced cznic/lldb.
regarding backup, since ql uses a single file don't you think you can just copy&gzip it?
I think he wants to make a dump/backup while mutating the DB. Then it's not that easy anymore.
I see, probablySQL dump should be possible. Since all ops relate to the single database. It is okay to say you can restore by recreating the database file content.
This is hard.
Thinking about it, maybe the solution is to
- Wait for any pending R/W transaction to finish to have a consistent file image.
- Enter backup mode which keeps the WAL growing during the dump/backup which then becomes a simple file copy.
- Flush the WAL while coordinating with any R/W transaction pending, if any.
- Exit dump mode.
Wait for any pending R/W transaction to finish to have a consistent file image. Tricky
Do we keep count of pending transactions?
Do we keep count of pending transactions?
No, but IIRC DB.rw reflects if there is any R/W transaction executing. If so, it executes in isolation.
That still wont be able to address to when all transactions are done.
The DB state machine knows when it turns the rw field to false: here and here. In those places we can check for a pending dump/backup request and enter the according state.
I see, that makes sense. I haven't groked th FSM yet!
offtopic: I have been running go-fuzz against db.Run for 1h20m no crashes yet!
// Flush ends the transaction collecting window, if applicable. IOW, if the DB
// is dirty, it schedules a 2PC (WAL + DB file) commit on the next outer most
// DB.Commit or performs it synchronously if there's currently no open
// transaction.
//
// The collecting window is an implementation detail and future versions of
// Flush may become a no operation while keeping the operation semantics.
By any change was there any point in time when this was implemented? The explanation looks like it is what we will need to do when dumping.
I haven't groked th FSM yet!
Here's its State Transition Table.
I have been running go-fuzz against db.Run for 1h20m no crashes yet!
Interesting experiment and glad to hear that!
By any change was there any point in time when this was implemented? The explanation looks like it is what we will need to do when dumping.
It's a nop because currently the WAL flush is performed unconditionally on COMMIT of the outermost R/W transaction. WAL cannot be flushed before that actually because we van get ROLLBACK instead of COMMIT in the end.
// The collecting window is an implementation detail and future versions of
// Flush may become a no operation while keeping the operation semantics.
There has been some point in time when we were collecting transactions for 1 sec before flushing the WAL for performance reasons. You're right, this is similar to what dump/backup needs to do.
While we are on this subject. I faced a challenge when trying to enforce canceling queries. From my understanding, when execution is happening in the FSM. This is the expected behavior when we cancel the context.
- If we are entering rw state, we just halt and rollback.
- If we are inside rw state , what happens?
Or can we just limit the scope to when we are evaluating the result set? I have feeling for it to make sense we need db.run1ctx which will also accept context.Context and handle canceling signal well.
If we are inside rw state , what happens?
In such case we probably/maybe need to effectively "inject" a ROLLBACK into the executing statement list.
Or can we just limit the scope to when we are evaluating the result set? I have feeling for it to make sense we need db.run1ctx which will also accept context.Context and handle canceling signal well.
I think you're right. As we don't yet have SELECT INTO, cancelling a R/W transaction is probably not really useful.
I see. One strategy is, properly cancel result sets( which is trivial), then we make sure we don't step into rw mode when the query is cancelled. When the cancel signal is sent and we are already in rw mode we leave it to the gods 😃
select {
case <-ctx.Done():
return // no op
default:
db.run1()
}
Sounds like a good plan to me ;-)
While the idea seems good. I tried it and it doesn't work well with the database/sql interfaces.
For canceling queries, the databas/sql QueryContext expects cancellation at all levels, ie execution and evaluation of result set. But with ql we delay the evaluation until when we call Do method of the returned multi row result set.
rs,err:=db.QueryContext(ctx, "select sleep(5)");
err is expected not to be nil, somehow when ctx is canneled while we are still sleeping, but sleep function is not evaluated/called until you start scannning the rows.
What I'm saying is, for properly supporting cancellation. This must happen at execution level , there is no way for the gods to say stop otherwise. Since the execution FSM is way over my head, there is nothing I can do about the #172 at the moment.
WRT backups: Can i just gzip the database file? Will there be a time where this is in a 'corrupt' state? I would expect the semantics of db.Flush() to mean the any successful commits made prior to Flush() would be captured in the db file. I understand Flush() is currently a no-op, is my expectation still correct?
I have considered contribution, however I get very lost in the code. Is there a unofficial HACKING file/guide or similar?
Can i just gzip the database file?
Yes.
Will there be a time where this is in a 'corrupt' state?
The DB file is in inconsistent state during committing the WAL file to the DB file - that being a result of outermost COMMIT of a R/W transaction.
I would expect the semantics of db.Flush() to mean the any successful commits made prior to Flush() would be captured in the db file.
Correct.
I understand Flush() is currently a no-op, is my expectation still correct?
Yes and it's valid until a new R/W transaction WAL starts to be commited back to the DB file.
Is there a unofficial HACKING file/guide or similar?
Unfortunately not, sorry.
It sounds like If I start gzipping the database while other mutating db ops are in flight, then the file could become corrupted (failed backup). Is there a way to 'take a lock' on mutating the database for the sake of a backup? If this is as simple as holding an internal lock, this might be a nice first contribution.
Untested (error handling omitted):
Direct API
ctx := ql.NewRWCtx()
db.Run(ctx, "begin transaction")
backup()
db.Run(ctx, "rollback")
sql API
tx := db.Begin()
backup()
tx.Rollback()
@twitchyliquid64 Please let us know if you have tried something and what were the results, if any, thank you.
Sorry for the delay, ive been blocked on other features so haven't done backup yet, but will update this issue once I circle back.
Cheers & <3 Tom
No problem, thanks.
FYI, current plan is to build a package that uses __Table and _Column systables, so a user just has to call backup(). Slated for implementation after I revamp the planner.