mysql
mysql copied to clipboard
pr-451 with additional tests and some documentation editing
Description
This is another attempt to get the changes in #451 to be merged. We have preserved the author of the original commit and made the suggested documentation changes and added a test case. I will attempt to explain what the issue is trying to fix:
We have a MariaDB cluster with a proxy routing traffic to one node in the cluster. During rolling updates, the active node will send ER_CONNECTION_CLOSED
err packet and close the connection. This causes the driver to return ErrPktSync
to the client. Given that the driver is not returning ErrBadConn
golang will not retry the query (or close the connection). This pushes the responsibility of retrying on the client which is totally acceptable. Except If there are few hundred connections open (in our case 300
) any retry logic with reasonable number of retries (in our case 3
) isn't sufficient. What will end up happening, golang will pick another connection from the pool of closed connections and the client will get another ErrPktSync
.
I'm happy to talk more about different solutions to the problem we are having.
Checklist
- [x] Code compiles correctly
- [x] Created tests which fail without the change (if possible)
- [x] All tests passing
- [ ] Extended the README / documentation, if necessary
- [x] Added myself / the copyright holder to the AUTHORS file
I'm -1 to retry. It's not safe.
We have a MariaDB cluster with a proxy routing traffic to one node in the cluster. During rolling updates,
DB.SetConnMaxlifetime() solves many problems including it. For example,
-
DB.SetConnMaxLifetime(time.Second * 10)
- Stop proxying new connection to retiring DB.
- Wait 20 seconds. (If you have very long transaction, you should wait 10sec + maximum transaction period)
- Check all connection to retiring DB is closed.
- Shutdown retiring DB.
In this way, you can safely rolling update your DBs behind proxy.
I'm -1 to retry. It's not safe.
Can you elaborate why you think it is not safe and also define exactly which action you think is unsafe.
/cc @arnehormann for context on the #451
See #302.
Or does MariaDB document guarantee that already sent query doesn't take any effect
when ER_CONNECTION_KILLED
is returned?
There are many special cases (e.g. queries which arn't transactional, multi statement, stored procedure, or anything else I don't know yet).
One example I can conceive is killed while filesort (I think MariaDB support it):
UPDATE t1 SET x=x+1 WHERE id=42;
SELECT * FROM t2 ORDER BY c3; /* can be killed while filesort */
Even if this case is safe to retry, I think we need documented guarantee about it's really safe to retry implicitly.
@methane i had a chat with salle
on the maria irc channel and he/she had more examples for why retry may not be safe. Some of the examples are:
- session variables and/or user variables are lost on reconnect but can affect query results
- as you mentioned before, depending on the engine some DML may not be trasactional
Does it make sense to make this an option in the connection string ? In our use case the queries are idempotent and won't cause issues on retry.
Does it make sense to make this an option in the connection string ? In our use case the queries are idempotent and won't cause issues on retry.
I don't want to add options only for specific usecase.
If your query is idempotent, why not retry at application layer? Application knows query is idempotent, but driver doesn't know.
Additionally, I recommend DB.SetConnMaxLifetime(time.Second * 10)
in general.
Doesn't it solve your case?
If your query is idempotent, why not retry at application layer? Application knows query is idempotent, but driver doesn't know.
I explained previously, retrying the query on the client side is not sufficient. We have a pool of 300
connections. Retrying N of times (where N < 300
) will not work, since go will pick another connection from the pool and the query will fail with ErrPktSync
. The reason ErrBadConn
fixes the issue, is go will only try twice to run the query and if both attempts fail with ErrBadConn
a new connection will be established. This will have the effect of eventually closing all connections in the old pool while keeping the queries from failing.
Additionally, I recommend DB.SetConnMaxLifetime(time.Second * 10) in general. Doesn't it solve your case?
Can you elaborate on how we can use this function to achieve a smooth query execution after a server restart ?
If you have short enough MaxLifetime and long enough sleep between retry, no need to 300 retries.
Or you can use DB.SetMaxIdleConns(0)
before retrying.
Otherwise, you need to 300 retries at last.
Can you elaborate on how we can use this function to achieve a smooth query execution after a server restart ?
It close all idle connections which created before 10 seconds ago. So if you have 300 connections, all of them are not used after 10 seconds. No need to send query and receive error.
If application and MySQL server on same LAN (~5ms latency), even 1 second is efficient enough.
As was already illustrated, handling this on the driver side with ErrBadConn
is not safe and will not be added.
The question is what alternatives there are. The best I can come up with right now is to return a distinct error type in the driver, which can then be used on the client side to react appropriately, e.g. with an attempt to flush the connection pool (setting db.SetMaxIdleConns(0)
temporarily might work).