aws-mysql-jdbc
aws-mysql-jdbc copied to clipboard
After AWS Blue Green MySQL 5.7 to 8 Upgrade driver is not able to connect anymore
Describe the bug
After AWS Blue Green MySQL 5.7 to 8 Upgrade driver is not able to connect anymore. Using the latest 1.1.4 version we were using AWS Aurora MySQL 5.7 and we did a Blue/Green Deployment upgrading to Aurora RDS MySQL 8.0, and after the switch over was done we weren't able to connect to the cluster anymore using the AWS MySQL JDBC driver, funny thing is that using MariaDB driver version 2.6.0 with Aurora failover balacing mode enabled we are able to connect to the new cluster and every thing works.
Below are the logs:
We also have the AWS Support ticket open: 12299991231
Expected Behavior
The driver should be able to (re)connect
Current Behavior
Running apps won't reconnect to the database New apps deployed will be boot looping because they can't connect to the DB.
Reproduction Steps
Have a running Aurora RDS MySQL 5.7 running in Sydney region, do a blue green deployment upgrading to 8.0, do the switchover.
Possible Solution
No response
Additional Information/Context
No response
The AWS JDBC Driver for MySQL version used
1.1.4
JDK version used
Corretto 17.0.6
Operating System and version
Fargate - AmazonCorretto:17 docker image
The database address and database name were changed in order to not disclose it as github is public, but if you have access to the AWS Support Ticket you will be able to get the real cluster address and the log with real cluster address.
Hi @luneo7
Thanks for reaching out and raising this issue.
We will be investigating this and keep you posted with updates.
Thank you!
Hi @luneo7
Can you confirm that all nodes on the cluster are accessible from where it is deployed to?
Thank you
Yup it is, using MariaDB driver with Aurora mode we are fully able to connect to the cluster (same url configuration) and it has been our workaround
Hi @luneo7
Would it be possible to check accessibility to the instances (not the cluster) themselves? Based on the logs, it looks like connecting to the cluster endpoint is good. But once it tries to connect to a specific instance it starts to have trouble. I would like to rule out the possibility it might be network or network configuration issue.
Thank you!
Connecting directly to the instances using the instance endpoint works, the issue happens when the AWS JDBC driver fetches the topology with:
SELECT SERVER_ID, SESSION_ID, LAST_UPDATE_TIMESTAMP, REPLICA_LAG_IN_MILLISECONDS
FROM information_schema.replica_host_status
WHERE time_to_sec(timediff(now(), LAST_UPDATE_TIMESTAMP)) <= 300
ORDER BY LAST_UPDATE_TIMESTAMP DESC
It seems that the driver builds connections strings with the server id, and as we did the Blue/Green Deployment there as instructed by AWS, the switchover changed the DNS but the instance are still reporting the server id with -green suffix, so the driver builds a connection to a DNS that doesn't seem to exist.
It might be an issue with the underlying stuff that AWS does under the hood in the Blue/Green switchover... or the topology might have to be discovered differently in those scenarios... dunno don't have all the info of what happens behind the scenes in AWS.
And a side note, we were using the driver successfully connecting with the same config without changing anything when the cluster was running 5.7, and we've run the Blue/Green through AWS console upgrading the database to 8.0, everything was automated by AWS itself, we didn't change any config since we use the cluster endpoint to connect and they are not changed with the Blue/Green Deployment process.
Another note, MariaDB driver seems to work because it seems to fallback to the cluster endpoint, and connecting through the cluster endpoint (both write and read) which was used in the driver connection string works, and as AWS JDBC driver doesn't do that it fails.
Thank you for your feedback @luneo7. The aws-mysql-jdbc driver does not currently support AWS Blue Green deployments. For visibility, we'll be updating our documentation to make this explicit and we'll be tracking this item in our backlog as a feature request. We'll share the feedback with the team. Thank you.
Don't think this is just a matter of supporting, the way that I see it is a bug, since the driver connects with the provided URL but if the failover plugin fails everything else fails, there should be something to handle this scenario... since the topology is built only after the connection was successfully made to the server we can assume that the server is there and it should fallback to that and not fail everything. The failover plugin should be resilient to not fail the whole driver.
track
Are there updates on this? Is the support to Blue/Green deployment planned to be implemented?
Are there updates on this?
For visibility, Blue/Green deployment is part of our backlog. We will provide a more concrete update to timelines once the information is made available.