aws-mysql-jdbc icon indicating copy to clipboard operation
aws-mysql-jdbc copied to clipboard

After failover sometimes connections are readonly when should not.

Open tommisc opened this issue 3 years ago • 8 comments

Describe the issue I am getting sometimes error The MySQL server is running with the --read-only option so it cannot execute this statement after failover for connections that are using cluster endpoint

Driver Version? 1.0.0

Java Version? Jdk 17

OS Version?

MySQL Version? 5.7.mysql_aurora.2.10.2

To Reproduce I am using Hikari connection pool 3.4.5. Sometimes after failover connections using writable endpoint are getting error that connection is read only. And situation will not solve over time.

Expected behaviour Connection should be always writable.

Logs If possible MySQL logs surrounding the occurrence of the issue

tommisc avatar May 25 '22 11:05 tommisc

Here is log file aws_jdbc.log

tommisc avatar May 25 '22 12:05 tommisc

I am having the same problem. change Hikari connection pool 2.7.9 is ok. https://github.com/awslabs/aws-mysql-jdbc/issues/209

zixubingfeng avatar May 26 '22 06:05 zixubingfeng

Problem also exists with Hikari version 5.0.1

tommisc avatar May 27 '22 08:05 tommisc

Hi all, thank you for raising this issue. We're currently looking into it and will share more information once we know more about what is going on. Thank you for your patience.

karenc-bq avatar May 27 '22 21:05 karenc-bq

Any update on this? We have a 3-minute failover duration and are unsure if this is a problem with Hikari or the driver. Does the fact the bug tag was added mean this has been found to be a driver bug?

Michael-S-Smith avatar Jun 21 '22 17:06 Michael-S-Smith

Hi @Michael-S-Smith

Unfortunately there's no update at the moment. We're still looking into the issue. We're not yet sure if the root cause of the issue is a pure driver bug or an integration issue between the driver and Hikari CP.

We'll keep you informed. Thank you for patience!

sergiyvamz avatar Jun 21 '22 22:06 sergiyvamz

This seems to be an issue in 4.0.3 as well. Please confirm if there is any new releases coming up which involves the fix.

dipindev avatar Jun 30 '22 17:06 dipindev

Hello @tommisc

We have investigated the issue and found a single scenario that may lead to the issue that you reported. If your application performs steps as it follows, then driver failover process connects to a replica instead of a writer node.

// open connection to a writer node by using DriverManager or HikariCP

// application uses connection

connection.setReadOnly(true);

// application uses connection

Statement statement = connection.createStatement();

// failover occurs here or during executing the following query
statement.executeQuery("select sleep(1000)");

// failover is completed and connection gets connected to a reader node

Calling setReadOnly(true) gives the driver a hint that the current connection is read-only and it's safe to switch to replica node if needed. Thus, when failover occurs, the driver re-connects to a replica node.

It's hard to say whether the following scenario is actually the one that leads to the issues that you reported. However I see some lines in the provided log that supports the idea.

2022-05-25 15:07:29,855 TRACE [DBShard0Slave connection adder] (Slf4JLogger.java:99) - [ClusterAwareConnectionProxy] explicitlyReadOnly=true 2022-05-25 15:07:29,855 TRACE [DBShard0Slave connection adder] (Slf4JLogger.java:99) - [ClusterAwareConnectionProxy] explicitlyReadOnly=true

Are you able to confirm if setReadOnly(true) is in fact occurring your workflow?

sergiyvamz avatar Jul 07 '22 22:07 sergiyvamz

@sergiyvamz the read-only flag isn't being set by the client, but by the upstream server.

Eg. If you have a two instance Aurora MySQL cluster, and connect with the cluster endpoint, your traffic is sent to the 'writer' instance. Connection open and persisted for performance etc. If you failover that 'writer' role to the other instance, the original 'writer' becomes a 'readonly' instance; but where persistent connections are still open to the original writer, requests on that existing connection will now fail, stating that it is a Read Only instance. At this point, we should close the existing connection, and aim to reconnect to endpoint again to be directed to the new writer instance; though some care may need to be taken to avoid DNS cache issues and endless disconnect/reconnect cycles if the connection remains readonly for a long duration...

dalgibbard avatar Nov 17 '22 11:11 dalgibbard

@dalgibbard are you connecting to the cluster endpoint or the instance endpoints ?

davecramer avatar Nov 18 '22 16:11 davecramer

@dalgibbard are you connecting to the cluster endpoint or the instance endpoints ?

Cluster endpoint

dalgibbard avatar Nov 18 '22 17:11 dalgibbard

The driver should work fine as long as you connect to the cluster endpoint first and then let it fail over. It should be reconnecting to the correct writer end point. From what you are saying this is not the case. Can you provide any more information that would help us figure out what is happening ?

davecramer avatar Nov 18 '22 18:11 davecramer

I'll re-run our steps to reproduce next week, and report back

dalgibbard avatar Nov 18 '22 19:11 dalgibbard

@dalgibbard thanks! Have a great weekend

davecramer avatar Nov 18 '22 20:11 davecramer

Hello @dalgibbard , following up on this. Do you have any updates regarding this?

For visibility, the 1.1.2 release also contains multiple fixes involving failover (#306). If you haven't tried yet, please use the latest version of the driver.

Thanks.

hsuamz avatar Dec 13 '22 00:12 hsuamz

Hello @dalgibbard , following up on this. Do you have any updates regarding this?

For visibility, the 1.1.2 release also contains multiple fixes involving failover (#306). If you haven't tried yet, please use the latest version of the driver.

Thanks.

Apologies for the lack of update, other priorities came up :( I'll pass the info over to the team here for when they get a moment. Seems like this might solve it for us though.

dalgibbard avatar Dec 13 '22 08:12 dalgibbard

Hello @tommisc and @dalgibbard

We will be closing this ticket now.

If this issue still persists, please re-open the issue.

Thank you

aaronchung-bitquill avatar Mar 01 '23 22:03 aaronchung-bitquill