aws-mysql-jdbc
aws-mysql-jdbc copied to clipboard
After failover sometimes connections are readonly when should not.
Describe the issue I am getting sometimes error The MySQL server is running with the --read-only option so it cannot execute this statement after failover for connections that are using cluster endpoint
Driver Version? 1.0.0
Java Version? Jdk 17
OS Version?
MySQL Version? 5.7.mysql_aurora.2.10.2
To Reproduce I am using Hikari connection pool 3.4.5. Sometimes after failover connections using writable endpoint are getting error that connection is read only. And situation will not solve over time.
Expected behaviour Connection should be always writable.
Logs If possible MySQL logs surrounding the occurrence of the issue
Here is log file aws_jdbc.log
I am having the same problem. change Hikari connection pool 2.7.9 is ok. https://github.com/awslabs/aws-mysql-jdbc/issues/209
Problem also exists with Hikari version 5.0.1
Hi all, thank you for raising this issue. We're currently looking into it and will share more information once we know more about what is going on. Thank you for your patience.
Any update on this? We have a 3-minute failover duration and are unsure if this is a problem with Hikari or the driver. Does the fact the bug tag was added mean this has been found to be a driver bug?
Hi @Michael-S-Smith
Unfortunately there's no update at the moment. We're still looking into the issue. We're not yet sure if the root cause of the issue is a pure driver bug or an integration issue between the driver and Hikari CP.
We'll keep you informed. Thank you for patience!
This seems to be an issue in 4.0.3 as well. Please confirm if there is any new releases coming up which involves the fix.
Hello @tommisc
We have investigated the issue and found a single scenario that may lead to the issue that you reported. If your application performs steps as it follows, then driver failover process connects to a replica instead of a writer node.
// open connection to a writer node by using DriverManager or HikariCP
// application uses connection
connection.setReadOnly(true);
// application uses connection
Statement statement = connection.createStatement();
// failover occurs here or during executing the following query
statement.executeQuery("select sleep(1000)");
// failover is completed and connection gets connected to a reader node
Calling setReadOnly(true) gives the driver a hint that the current connection is read-only and it's safe to switch to replica node if needed. Thus, when failover occurs, the driver re-connects to a replica node.
It's hard to say whether the following scenario is actually the one that leads to the issues that you reported. However I see some lines in the provided log that supports the idea.
2022-05-25 15:07:29,855 TRACE [DBShard0Slave connection adder] (Slf4JLogger.java:99) - [ClusterAwareConnectionProxy] explicitlyReadOnly=true 2022-05-25 15:07:29,855 TRACE [DBShard0Slave connection adder] (Slf4JLogger.java:99) - [ClusterAwareConnectionProxy] explicitlyReadOnly=true
Are you able to confirm if setReadOnly(true) is in fact occurring your workflow?
@sergiyvamz the read-only flag isn't being set by the client, but by the upstream server.
Eg. If you have a two instance Aurora MySQL cluster, and connect with the cluster endpoint, your traffic is sent to the 'writer' instance. Connection open and persisted for performance etc. If you failover that 'writer' role to the other instance, the original 'writer' becomes a 'readonly' instance; but where persistent connections are still open to the original writer, requests on that existing connection will now fail, stating that it is a Read Only instance. At this point, we should close the existing connection, and aim to reconnect to endpoint again to be directed to the new writer instance; though some care may need to be taken to avoid DNS cache issues and endless disconnect/reconnect cycles if the connection remains readonly for a long duration...
@dalgibbard are you connecting to the cluster endpoint or the instance endpoints ?
@dalgibbard are you connecting to the cluster endpoint or the instance endpoints ?
Cluster endpoint
The driver should work fine as long as you connect to the cluster endpoint first and then let it fail over. It should be reconnecting to the correct writer end point. From what you are saying this is not the case. Can you provide any more information that would help us figure out what is happening ?
I'll re-run our steps to reproduce next week, and report back
@dalgibbard thanks! Have a great weekend
Hello @dalgibbard , following up on this. Do you have any updates regarding this?
For visibility, the 1.1.2 release also contains multiple fixes involving failover (#306). If you haven't tried yet, please use the latest version of the driver.
Thanks.
Hello @dalgibbard , following up on this. Do you have any updates regarding this?
For visibility, the 1.1.2 release also contains multiple fixes involving failover (#306). If you haven't tried yet, please use the latest version of the driver.
Thanks.
Apologies for the lack of update, other priorities came up :( I'll pass the info over to the team here for when they get a moment. Seems like this might solve it for us though.
Hello @tommisc and @dalgibbard
We will be closing this ticket now.
If this issue still persists, please re-open the issue.
Thank you