go-dqlite
go-dqlite copied to clipboard
driver: Detect EOF in driverError
Fixes https://github.com/canonical/go-dqlite/issues/182
Signed-off-by: Mathieu Borderé [email protected]
@MathieuBordere once this is merged, please can you create a new release version and tag for this so we can update LXD's dependencies ready for LXD 5.1?
With this change you would be able to re-merge the dqlite change that was reverted, right?
With this change you would be able to re-merge the dqlite change that was reverted, right?
Yes indeed.
But maybe I should test for EOF at the end of the other error handling? I'm just a bit confused by err
being reassigned a couple of times e.g. do we get the original err
back here? https://github.com/canonical/go-dqlite/blob/5724311830ff770590d89ae48b2ea40c6566dc95/driver/driver.go#L775
I think that it is fine as it is as it will find EOF anywhere in the error chain (it will call Unwrap internally).
@MathieuBordere can you fix up the test golint issues please (looks like just comment annotation issues)
https://github.com/canonical/go-dqlite/pull/186/files "Unchanged files with check annotations Beta "
@MathieuBordere can you fix up the test golint issues please (looks like just comment annotation issues)
https://github.com/canonical/go-dqlite/pull/186/files "Unchanged files with check annotations Beta "
Done, the expected type, found '='
in type NodeInfo = client.NodeInfo
doesn't show up locally running golint and is maybe related to https://github.com/golang/lint/issues/335
But maybe I should test for EOF at the end of the other error handling? I'm just a bit confused by
err
being reassigned a couple of times e.g. do we get the originalerr
back here?https://github.com/canonical/go-dqlite/blob/5724311830ff770590d89ae48b2ea40c6566dc95/driver/driver.go#L775
You have a good point. Yes, I do think it's a good idea to place this at the end of the chain, in the default statement, and yes you should be fine with the err
object returned by root.Unwrap()
.
So hopefully something like:
default:
// When using a TLS connection, the underlying error might get
// wrapped by the stdlib itself with the new errors wrapping
// conventions available since go 1.13. In that case we check
// the underlying error with Unwrap() instead of Cause().
if root, ok := err.(unwrappable); ok {
err = root.Unwrap()
}
switch err.(type) {
case *net.OpError:
fallthrough
case *io.EOF:
log(client.LogDebug, "network connection lost: %v", err)
return driver.ErrBadConn
}
should work.
One thing that concerns me is that I believe the proxy (both the LXD one and the go-dqlite/app
one) might actually depend on getting back an io.EOF
for graceful stopping and cleanup. It might be just a speculation, but the whole mechanism seems to have gotten a bit wild and might need some attention, designing and implementing a good system for leadership and network-related error handling and propagation, as we were saying in the other PR too.
I suppose if the lxd tests pass OK then it should be fine
I suppose if the lxd tests pass OK then it should be fine
suite is running now, will report back.
There are also tests in this repo that check that the result set iterators return io.EOF at the end of the rows, so if they pass that will be encouraging. But moving the check to the default:
section shouldn't hurt either if it makes the more specific error handlers have an opportunity to check first.
I will mark this as draft for now, and investigate further before quickly patching up the issue uncovered by that dqlite PR, don't want to add to the mess.
I will handle the dqlite leadership loss in a different way than closing the connection.
I will handle the dqlite leadership loss in a different way than closing the connection.
Which way are you thinking?
I will handle the dqlite leadership loss in a different way than closing the connection.
Which way are you thinking?
It will close the SQLite leader connection to the database, eliminating the Assert
you see in the lxd testsuite, it will finish the ongoing SQL request with a LEADERSHIP_LOST
and all future db queries on that connection will return LEADERSHIP_LOST
causing the driver to mark the connection as bad see https://github.com/canonical/go-dqlite/blob/5724311830ff770590d89ae48b2ea40c6566dc95/driver/driver.go#L752
I will handle the dqlite leadership loss in a different way than closing the connection.
Which way are you thinking?
It will close the SQLite leader connection to the database, eliminating the
Assert
you see in the lxd testsuite, it will finish the ongoing SQL request with aLEADERSHIP_LOST
and all future db queries on that connection will returnLEADERSHIP_LOST
causing the driver to mark the connection as bad seehttps://github.com/canonical/go-dqlite/blob/5724311830ff770590d89ae48b2ea40c6566dc95/driver/driver.go#L752
Looking again at the description of #354, I feel there might be a bit of confusion about what LEADERSHIP_LOST
is supposed to mean/do.
Essentially, LEADERSHIP_LOST
should be returned in case the client asks for a commit and that commit can't be executed because the node lost leadership while waiting for a quorum of the frames command log entry that it had produced as soon as it received the commit request from the client.
In that case the client can't make any assumption about whether the commit was (or will be) actually successful, because there's no way to know if a quorum of nodes had received the entry. This is basically the situation described in section 6.3 (Implementing linearizable semantics) of the Raft dissertation, and the solution indicated there would be to implement client sessions and request IDs in the FSM. With that in place, the client would retry the commit request as soon as it finds a new leader, using sessions and request IDs the new leader would know if it is a brand new commit request, or actually a duplicated one (since the original one got committed despite the client received a LEADERSHIP_LOST
error). This design is also described in the dqlite docs (see the "Client sessions" section at the end of the page), but it is still to be implemented.
Now, after having described what LEADERSHIP_LOST
was meant to be, let's look at the problematic sequence of events described in #354 :
- Node 1 is leader, Node 2 is follower
- Node 1 receives BEGIN - QUERY XXXX - ... (notice that it doesn't receive a COMMIT message)
- Node 1 transfers leadership to Node 2, or loses leadership in some way
^^^ at this point what I believe should happen is:
- For the client: as soon as it tries to send the COMMIT message it will get back a
NOT_LEADER
error, since the gateway will notice that it's not the leader anymore. The Go driver should already handle that and propagate adriver.ErrBadConn
so the connection gets closed by the client itself (both the network connection and the associated internal SQLite object/connection) with no need to do anything special from the server. - For the server: as soon as it detects that the node is not anymore the leader (using the
monitor_cb
) it rolls back any uncommitted transactions in the VFS, usingVfsAbort()
.
And then, proceeding in the problematic sequence of events:
- Node 2 writes some data to the db
- Node 1 receives the frames, from Node 2, tries to apply them to the DB and triggers an Assert due to the pending transaction in the leader connection from step 2 that is holding locks.
^^^ at this point the assertion shouldn't be triggered since the server had rolled back that pending transaction.
What I described should result in relatively small changes in the monitor_cb
to issue the relevant VfsAbort()
calls, and nothing else. In the future we might want to tackle the much larger problem of proper LEADERSHIP_LOST
handling using client sessions and request IDs.
I will handle the dqlite leadership loss in a different way than closing the connection.
Which way are you thinking?
It will close the SQLite leader connection to the database, eliminating the
Assert
you see in the lxd testsuite, it will finish the ongoing SQL request with aLEADERSHIP_LOST
and all future db queries on that connection will returnLEADERSHIP_LOST
causing the driver to mark the connection as bad see https://github.com/canonical/go-dqlite/blob/5724311830ff770590d89ae48b2ea40c6566dc95/driver/driver.go#L752Looking again at the description of #354, I feel there might be a bit of confusion about what
LEADERSHIP_LOST
is supposed to mean/do.Essentially,
LEADERSHIP_LOST
should be returned in case the client asks for a commit and that commit can't be executed because the node lost leadership while waiting for a quorum of the frames command log entry that it had produced as soon as it received the commit request from the client.In that case the client can't make any assumption about whether the commit was (or will be) actually successful, because there's no way to know if a quorum of nodes had received the entry. This is basically the situation described in section 6.3 (Implementing linearizable semantics) of the Raft dissertation, and the solution indicated there would be to implement client sessions and request IDs in the FSM. With that in place, the client would retry the commit request as soon as it finds a new leader, using sessions and request IDs the new leader would know if it is a brand new commit request, or actually a duplicated one (since the original one got committed despite the client received a
LEADERSHIP_LOST
error). This design is also described in the dqlite docs (see the "Client sessions" section at the end of the page), but it is still to be implemented.Now, after having described what
LEADERSHIP_LOST
was meant to be, let's look at the problematic sequence of events described in #354 :* Node 1 is leader, Node 2 is follower * Node 1 receives BEGIN - QUERY XXXX - ... (notice that it doesn't receive a COMMIT message) * Node 1 transfers leadership to Node 2, or loses leadership in some way
^^^ at this point what I believe should happen is:
* For the client: as soon as it tries to send the COMMIT message it will get back a `NOT_LEADER` error, since the gateway will notice that it's not the leader anymore. The Go driver should already handle that and propagate a `driver.ErrBadConn` so the connection gets closed by the client itself (both the network connection and the associated internal SQLite object/connection) with no need to do anything special from the server. * For the server: as soon as it detects that the node is not anymore the leader (using the `monitor_cb`) it rolls back any uncommitted transactions in the VFS, using `dqlite_vfs_abort`.
And then, proceeding in the problematic sequence of events:
* Node 2 writes some data to the db * Node 1 receives the frames, from Node 2, tries to apply them to the DB and triggers an Assert due to the pending transaction in the leader connection from step 2 that is holding locks.
^^^ at this point the assertion shouldn't be triggered since the server had rolled back that pending transaction.
What I described should result in relatively small changes in the
monitor_cb
to issue the relevantdqlite_vfs_abort
calls, and nothing else. In the future we might want to tackle the much larger problem of properLEADERSHIP_LOST
handling using client sessions and request IDs.
I understand your reasoning but VfsAbort
doesn't get rid off the Assert, it's a read lock that is still being held in the test, it's just a read transaction.
integration-test: src/vfs.c:2372: vfsInvalidateWalIndexHeader: Assertion 'shm->shared[4] == 0' failed.
while VfsAbort
basically just releases an exclusive lock
https://github.com/canonical/dqlite/blob/bc1cb2d51cb7e0c3c72d3233c3dfa5c1453d33a8/src/vfs.c#L2449
I thought that closing the sqlite connection would be a fool proof way to get rid off all the state introduced by the start of the transaction.
I understand your reasoning but
VfsAbort
doesn't get rid off the Assert, it's a read lock that is still being held in the test, it's just a read transaction.
integration-test: src/vfs.c:2372: vfsInvalidateWalIndexHeader: Assertion 'shm->shared[4] == 0' failed.
while
VfsAbort
basically just releases an exclusive lock https://github.com/canonical/dqlite/blob/bc1cb2d51cb7e0c3c72d3233c3dfa5c1453d33a8/src/vfs.c#L2449I thought that closing the sqlite connection would be a fool proof way to get rid off all the state introduced by the start of the transaction.
Yeah, I was going to follow-up since I just realized that. Indeed VfsAbort
should be used in the future for LEADERSHIP_LOST
error, because in that case there will be a write lock.
In this case, instead of using VfsAbort
, I'd suggest using a normal "ROLLBACK" SQL statement, e.g. sqlite3_exec(conn, "ROLLBACK", NULL, NULL, NULL).
That should do it.
Closing the connection altogether, probably works too, but we'd end up in bit of a mixed state in the gateway, which feels potentially a bit fragile, or will always need some care (i.e. we have to know that there are cases where the leader object or SQLite object is not there because it was closed elsewhere). Having a single path for releasing resource usually simplifies things.
I understand your reasoning but
VfsAbort
doesn't get rid off the Assert, it's a read lock that is still being held in the test, it's just a read transaction.integration-test: src/vfs.c:2372: vfsInvalidateWalIndexHeader: Assertion 'shm->shared[4] == 0' failed.
whileVfsAbort
basically just releases an exclusive lock https://github.com/canonical/dqlite/blob/bc1cb2d51cb7e0c3c72d3233c3dfa5c1453d33a8/src/vfs.c#L2449 I thought that closing the sqlite connection would be a fool proof way to get rid off all the state introduced by the start of the transaction.Yeah, I was going to follow-up since I just realized that. Indeed
VfsAbort
should be used in the future forLEADERSHIP_LOST
error, because in that case there will be a write lock.In this case, instead of using
VfsAbort
, I'd suggest using a normal "ROLLBACK" SQL statement, e.g.sqlite3_exec(conn, "ROLLBACK", NULL, NULL, NULL).
That should do it.
Yeah, I thought about this too, but there is a case where this can lead to inconsistent data I think.
- Client issues BEGIN
- Client issues some queries
- Node to which client is connected loses leadership
- We abort the transaction with ROLLBACK
- SQLite is now in autocommit mode
- Node to which client is connected regains leadership
- Client continues queries (these will autocommit, while the queries from earlier will not be comitted)
- Client issues COMMIT (this wil fail, as there is no BEGIN).
I understand your reasoning but
VfsAbort
doesn't get rid off the Assert, it's a read lock that is still being held in the test, it's just a read transaction.integration-test: src/vfs.c:2372: vfsInvalidateWalIndexHeader: Assertion 'shm->shared[4] == 0' failed.
whileVfsAbort
basically just releases an exclusive lock https://github.com/canonical/dqlite/blob/bc1cb2d51cb7e0c3c72d3233c3dfa5c1453d33a8/src/vfs.c#L2449 I thought that closing the sqlite connection would be a fool proof way to get rid off all the state introduced by the start of the transaction.Yeah, I was going to follow-up since I just realized that. Indeed
VfsAbort
should be used in the future forLEADERSHIP_LOST
error, because in that case there will be a write lock. In this case, instead of usingVfsAbort
, I'd suggest using a normal "ROLLBACK" SQL statement, e.g.sqlite3_exec(conn, "ROLLBACK", NULL, NULL, NULL).
That should do it.Yeah, I thought about this too, but there is a case where this can lead to inconsistent data I think.
* Client issues BEGIN * Client issues some queries * Node to which client is connected loses leadership * We abort the transaction with ROLLBACK * SQLite is now in autocommit mode * Node to which client is connected regains leadership * Client continues queries (these will autocommit, while the queries from earlier will not be comitted) * Client issues COMMIT (this wil fail, as there is no BEGIN).
Okay, thanks for pointing this out. I actually had also another idea, which I felt was actually better but a bit more subtle. Your point makes me feel that perhaps it's actually the best approach.
Basically we would not do anything in the monitor_cb
when the raft state changes, that's because there's always the chance that the node regains leadership and everything can proceed just as it was (like in your example).
Instead we should turn the assert(shm->shared[i] == 0)
in vfsInvalidateWalIndexHeader()
into a check: "if there are pending read transactions, rollback them, assuming they were started when this node was the leader and we do now have proof that there is another leader that has sent us new frames".
The only two things that I'm not totally sure about are:
- Do we have a way to turn that "assuming they were started when this node was the leader" part into something we can actually check instead of assuming? To avoid masking genuine cases where the assertion is valid and we should blow up because the read transaction was not started when the node was a leader.
- What does it mean to rollback here, probably releasing the read lock using
vfsShmUnlock()
should be enough, but I didn't check.
Point 1. is kind of minor, but something that would be nice to have.
One possible way to solve point 1. would maybe to use some sort of flag that gets set in the monitor_cb
call when converting from leader to follower: if the node regains leadership the flag would be unset (by monitor_cb
, when noticing that the new state is leader), if the node receives frames from a new leader the VFS would see that flag and positively know that it has pending state to rollback.
I don't really feel too comfortable with that, it feels a bit brittle as I don't fully understand it, I'd rather have SQLite itself do the cleanups for us for whatever state it introduced. I think that a case can be made for the fact that a connection to a leader node that at one point has lost leadership is "busted", shouldn't be reused and cleaned up.
I would propose to close the sqlite leader connection and add extra checks in functions that now assume that gateway->leader
can never be NULL.
If you close the SQLite connection only (and I guess destroy the struct leader
object too?), and not the network one, how do you communicate that to the client? Assuming that no information gets sent to the client within the context of the monitor_cb
(but correct me if that's not what you think), then the client will eventually try to issue a COMMIT
statement, and what happens in that case if the node has regained leadership in the meantime? (as per your example).
If you close the SQLite connection only (and I guess destroy the
struct leader
object too?), and not the network one, how do you communicate that to the client? Assuming that no information gets sent to the client within the context of themonitor_cb
(but correct me if that's not what you think), then the client will eventually try to issue aCOMMIT
statement, and what happens in that case if the node has regained leadership in the meantime? (as per your example).
I would always reply "LEADERSHIP_LOST" in that case (or some other error message if you prefer), basically the connection will be unusable and should be interpreted that way by the client/driver.
So yes, it can happen that a leader node reports "LEADERSHIP_LOST", but it just means that the driver should try a new connection, that's a bit dirty.
If you close the SQLite connection only (and I guess destroy the
struct leader
object too?), and not the network one, how do you communicate that to the client? Assuming that no information gets sent to the client within the context of themonitor_cb
(but correct me if that's not what you think), then the client will eventually try to issue aCOMMIT
statement, and what happens in that case if the node has regained leadership in the meantime? (as per your example).I would always reply "LEADERSHIP_LOST" in that case (or some other error message if you prefer), basically the connection will be unusable and should be interpreted that way by the client/driver.
Okay, so essentially you would set a flag on the gateway
object (like the leadership_lost
flag in your very first PR)? I would suggest then to not use LEADERSHIP_LOST
(since it has a different semantics, i.e. "retry with new leader"), but instead use something else, e.g. SQLITE_NOTFOUND
or anything more appropriate (e.g. one of the SQLITE_IOERR_XXX
codes).
Or maybe no flag is needed, since the LOOKUP_DB(ID)
call in gateway.c
already checks for req->gateway->leader == NULL
and returns SQLITE_NOTFOUND
in that case.
If you close the SQLite connection only (and I guess destroy the
struct leader
object too?), and not the network one, how do you communicate that to the client? Assuming that no information gets sent to the client within the context of themonitor_cb
(but correct me if that's not what you think), then the client will eventually try to issue aCOMMIT
statement, and what happens in that case if the node has regained leadership in the meantime? (as per your example).I would always reply "LEADERSHIP_LOST" in that case (or some other error message if you prefer), basically the connection will be unusable and should be interpreted that way by the client/driver.
Okay, so essentially you would set a flag on the
gateway
object (like theleadership_lost
flag in your very first PR)? I would suggest then to not useLEADERSHIP_LOST
(since it has a different semantics, i.e. "retry with new leader"), but instead use something else, e.g.SQLITE_NOTFOUND
or anything more appropriate (e.g. one of theSQLITE_IOERR_XXX
codes).Or maybe no flag is needed, since the
LOOKUP_DB(ID)
call ingateway.c
already checks forreq->gateway->leader == NULL
and returnsSQLITE_NOTFOUND
in that case.
Ok, I'll open a PR for that.
I'll give this another go, if the LXD test suite is happy, I'll probably merge it.
Thanks! :)
Possibly related, possibly to be implemented in conjunction with this change.
https://pkg.go.dev/database/sql/driver
All Conn implementations should implement the following interfaces: Pinger, SessionResetter, and Validator.
type SessionResetter interface {
// ResetSession is called prior to executing a query on the connection
// if the connection has been used before. If the driver returns ErrBadConn
// the connection is discarded.
ResetSession(ctx [context](https://pkg.go.dev/context).[Context](https://pkg.go.dev/context#Context)) [error](https://pkg.go.dev/builtin#error)
}
type Validator interface {
// IsValid is called prior to placing the connection into the
// connection pool. The connection will be discarded if false is returned.
IsValid() [bool](https://pkg.go.dev/builtin#bool)
}