aws-mysql-jdbc icon indicating copy to clipboard operation
aws-mysql-jdbc copied to clipboard

Connection sometimes doesn't failover with clusterInstanceHostPattern parameter

Open arosso opened this issue 2 years ago • 9 comments

Describe the bug

When a manual failover is triggered on our 2 node RDS Aurora cluster, the connection sometimes doesn't failover to the new writer and stays on the reader. This causes any inserts/updates to return an Exception:

java.sql.SQLException: The MySQL server is running with the --read-only option so it cannot execute this statement

The situation doesn't resolve itself until the connection is broken and re-established.

This does not happen every time but probably 1 out of 2 or 3 failovers.

We use a custom Route53 hostname for our RDS cluster, and therefore add the clusterInstanceHostPattern to the JDBC parameters. From what we can tell, when this parameter is present the, the situation happens. If we use the cluster name directly without the clusterInstanceHostPattern parameter the failover consistently works. Even using the cluster endpoint but including the clusterInstanceHostPattern parameter triggers the issue.

Expected Behavior

We are expecting that during a failover all our connections migrate to the new writer.

Current Behavior

During a failover, some connections remain on the old writer which is now a reader and we then get exceptions like:

java.sql.SQLException: The MySQL server is running with the --read-only option so it cannot execute this statement

Reproduction Steps

Attached is a Main.java that reproduces the issue. It is modeled on the code sample provided in the JDBC readme. It's the simplest example we could provide. It uses the Hikari connection pool. We have tried Hikari 5.0.1, and the AWS driver 1.1.3 and 1.1.4 but the behavior is the same.

To run:

java -cp aws-mysql-jdbc-1.1.4.jar:HikariCP-4.0.3.jar:slf4j-api-1.7.21.jar:slf4j-jdk14-1.7.21.jar:. Main "jdbc:mysql:aws://clusterhostname:3306/db?profileSQL=true&clusterInstanceHostPattern=?.clusterid.us-east-1.rds.amazonaws.com&enableClusterAwareFailover=true&logger=Slf4JLogger" "user" 'pass'

This has all logging on so it's very verbose. The code simply runs some transactions continually until it is stopped. While it's running if you initiate a failover of your RDS cluster sometimes you'll start seeing the read-only exception mentioned above. It seems like the connection is not re-established to the new writer.

We've excluded DNS caching issues. DNS records are all set to 5s TTL and the Java sample sets caching to a minimum. jdbctest.tar.gz

Possible Solution

Without clusterInstanceHostPattern, using the cluster endpoint, everything works. There seems to be logic in the code that triggers different behavior based on clusterInstanceHostPattern. We assume it has to do with detecting if the driver is talking to an Aurora cluster or just a simple MySQL server. Maybe a new parameter could be added to force this behavior if the issue cannot be fixed.

Additional Information/Context

No response

The AWS JDBC Driver for MySQL version used

1.1.4

JDK version used

Corretto 11.0.17.8.1

Operating System and version

Amazon Linux 2 (4.14.296-222.539.amzn2.x86_64)

arosso avatar Mar 24 '23 00:03 arosso