aws-postgresql-jdbc Splitting read/writte

It is possible to use this driver to split the read and write queries to use the primary and replicas Nodes of Aurora, as a way to balance the traffic.

Best Regards.

Jan 20 '22 12:01 jolivaSan

Hi jolivaSan!

Thank you for reaching us! Unfortunately, the driver can't split (I assume it mostly means "reroute") traffic depending on whether it's a read or write query. As a quick solution, I'd recommend to open one connection to a writer node and open a short-lived connection to a reader (replica) node. For a writer connection, you can use the cluster endpoint that always points to a writer. For a reader connection, you can use a reader cluster endpoint that resolves to a random instance endpoint. This will actively do a load balancing between readers. If your application opens a new reader connection for every read query, it will do the balancing between replicas. This is a simple solution; however, it's not an efficient one. You may want to consider using a connection pool for reader connections.

If you can provide details about your workflow, we'd be interested in gathering feedback to determine where we can optimize the driver.

Thank you!

Jan 24 '22 23:01 sergiyvamz

Hi @sergiyvamz and thanks for you quick answer,

We are using Amazon Aurora as our Data Store, we choose a managed cluster of Postgres because you know the availability (multi AZ), scalability, failover.... but we were also expecting that this product provide some way of Read-write splitting for improve the performance and avoid the data base lock caused by the write operations in our reads request, after check if AWS support seems that this feature is not implemented by Aurora/RDS itself and they say that should be implemented in the client side, thats why we are looking for a driver that can be configured to prfioritize the reads operations over read nodes and write operations over the main node, we know that for example in mongoDB this is something implemented by the client so thats why I'm asking here :-).

Best Regards.

Jan 25 '22 09:01 jolivaSan

@jolivaSan can you help me understand what you mean by "data base lock caused by the write operations" ?

Jan 25 '22 11:01 davecramer

Hi @davecramer,

Just an example of things that can cause performance issues in a data base, im speaking about this locks https://www.postgresql.org/docs/9.4/explicit-locking.html, but our goal using Read-write splitting in the DB cluster is for improve performance, with Read-write splitting we are balancing the load between different nodes so we expect to get more of our instances.

Best Regards.

Jan 25 '22 11:01 jolivaSan

HI @jolivaSan thanks for the explanation. As for explicit locking, it's pretty unusual for locks to actually block reads in Postgres, but the performance gains from splitting reads are pretty real. I am the maintainer of the pgjdbc driver and really can't speak for this driver, but the challenge for drivers in general is that we really don't know what a read or write is since you could be calling a function with "select insertsomedatafunction()". My understanding of doing this in the client usually means that the application knows what is actually a read and connects to a different node. It is interesting that mongodb is doing this though.

Jan 25 '22 11:01 davecramer

Aurora clusters have a DNS CNAME available known as the 'reader endpoint'. This endpoint uses DNS roundrobin to resolve to readers in a given cluster. The problem with this is that most clients do some form of caching on initial name resolution, so you don't get an even distribution.

I have seen customers use a network load balancer (NLB) to do this in a more distributed fashion. This does work, but comes with the overhead of having to maintain the list of readers yourself. It's straightforward using the API or AWS console to update the NLB as your cluster changes shape, but it is extra operational burden.

RDS Proxy does offer some capabilities when it comes to using readers. RDS Proxy now provides a reader endpoint similar to the standard Aurora Cluster and can distribute requests for you. That said, you will need to actually have a separate connection to that endpoint, but it does provide distribution as opposed to pinned connections.

Automating the determination of whether a connection / session / transaction / query is RW or RO seems straightforward, but the complexity of things like function calls (like @davecramer mentioned) make this a bit harder to get right. I'm not sure where the right place to do this type of work is, but it certainly seems like the driver may be a good place to look.

Jan 26 '22 17:01 scottrmead

The Aurora Read cluster endpoint does load-balancing for read traffic. However, with splitting read/writes we are left with the application to manage the datasource or connection against the master or slave. As far as the driver is concerned, the MariaDB connector/j for mysql implemented the aurora protocol to not only load balance read requests across read replicas but also read/write split based on connection/transaction attributes ( connection is set to read-only). Could we perhaps do something similar with the PostgreSQL driver?

Mar 01 '22 18:03 krisiye

I am currently working on the solution to split read and write but this has to be done within the application itself. The other nice solution I have tried is Heimdall but that is still a POC.

Out application is Spring Java-based and the split was easier to do by creating read and write data sources in JBoss and then enabling the app to use but the issue is the connection pool settings. Will update here in a few days, when I am done with this solution.

Mar 25 '22 08:03 aamirraza

@davecramer AFAIK MySQL driver can decide if connection should use write or read node based on the readOnly flag set on the JDBC connection. Sounds to me like same approach could work for Postgres driver..? Or there are some differences between MySQL and PostgreSQL related to this that I am not aware about? Or maybe MySQL design is flawed?

Apr 14 '22 08:04 maciejwalkowiak

@maciejwalkowiak Interesting. Yes this would work, however I think that requires the application knows that it is requesting a read node or a write node. From what I can tell this connection property is really just selecting an endpoint ? I think notionally people want the driver to figure out based on queries if it is a read query or a write query, no?

Apr 14 '22 12:04 davecramer

If a transaction in the application is marked readOnly the application framework (such as Spring) should be setting the connection property to readOnly dynamically for the transaction. Ideally, the driver would honor that and be able to route to the appropriate cluster endpoint (either read/write or ro cluster)? At least that's what mariadb did with the jdbc:mysql:aurora option. Not sure if the AWS Mysql Driver with jdbc:mysql:aws does the same.

Apr 14 '22 16:04 krisiye

cool that makes a lot of sense.. thanks for the clarification

Apr 14 '22 16:04 davecramer

For visibility, we are recommending everyone to move to our new repository here: https://github.com/awslabs/aws-advanced-jdbc-wrapper.

We have redesigned the driver as a JDBC Wrapper which has been tested to work with the PGJDBC driver. Please give it a look and we will be triaging issues in the new repository.

Read / Write splitting is not currently supported, but we are actively working on the implementation.

Thanks.

Oct 06 '22 18:10 hsuamz

@hsuamz is the read/write split supported by the new driver?

Jun 15 '23 11:06 ghilainm

@ghilainm yes, see https://github.com/awslabs/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheReadWriteSplittingPlugin.md

Jun 15 '23 11:06 davecramer

Please use https://github.com/awslabs/aws-advanced-jdbc-wrapper

Sep 15 '23 17:09 davecramer

aws-postgresql-jdbc aws-postgresql-jdbc copied to clipboard

Splitting read/writte

aws-postgresql-jdbc
aws-postgresql-jdbc copied to clipboard