aws-postgresql-jdbc
aws-postgresql-jdbc copied to clipboard
Splitting read/writte
It is possible to use this driver to split the read and write queries to use the primary and replicas Nodes of Aurora, as a way to balance the traffic.
Best Regards.
Hi jolivaSan!
Thank you for reaching us! Unfortunately, the driver can't split (I assume it mostly means "reroute") traffic depending on whether it's a read or write query. As a quick solution, I'd recommend to open one connection to a writer node and open a short-lived connection to a reader (replica) node. For a writer connection, you can use the cluster endpoint that always points to a writer. For a reader connection, you can use a reader cluster endpoint that resolves to a random instance endpoint. This will actively do a load balancing between readers. If your application opens a new reader connection for every read query, it will do the balancing between replicas. This is a simple solution; however, it's not an efficient one. You may want to consider using a connection pool for reader connections.
If you can provide details about your workflow, we'd be interested in gathering feedback to determine where we can optimize the driver.
Thank you!
Hi @sergiyvamz and thanks for you quick answer,
We are using Amazon Aurora as our Data Store, we choose a managed cluster of Postgres because you know the availability (multi AZ), scalability, failover.... but we were also expecting that this product provide some way of Read-write splitting for improve the performance and avoid the data base lock caused by the write operations in our reads request, after check if AWS support seems that this feature is not implemented by Aurora/RDS itself and they say that should be implemented in the client side, thats why we are looking for a driver that can be configured to prfioritize the reads operations over read nodes and write operations over the main node, we know that for example in mongoDB this is something implemented by the client so thats why I'm asking here :-).
Best Regards.
@jolivaSan can you help me understand what you mean by "data base lock caused by the write operations" ?
Hi @davecramer,
Just an example of things that can cause performance issues in a data base, im speaking about this locks https://www.postgresql.org/docs/9.4/explicit-locking.html, but our goal using Read-write splitting in the DB cluster is for improve performance, with Read-write splitting we are balancing the load between different nodes so we expect to get more of our instances.
Best Regards.
HI @jolivaSan thanks for the explanation. As for explicit locking, it's pretty unusual for locks to actually block reads in Postgres, but the performance gains from splitting reads are pretty real. I am the maintainer of the pgjdbc driver and really can't speak for this driver, but the challenge for drivers in general is that we really don't know what a read or write is since you could be calling a function with "select insertsomedatafunction()". My understanding of doing this in the client usually means that the application knows what is actually a read and connects to a different node. It is interesting that mongodb is doing this though.
Aurora clusters have a DNS CNAME available known as the 'reader endpoint'. This endpoint uses DNS roundrobin to resolve to readers in a given cluster. The problem with this is that most clients do some form of caching on initial name resolution, so you don't get an even distribution.
I have seen customers use a network load balancer (NLB) to do this in a more distributed fashion. This does work, but comes with the overhead of having to maintain the list of readers yourself. It's straightforward using the API or AWS console to update the NLB as your cluster changes shape, but it is extra operational burden.
RDS Proxy does offer some capabilities when it comes to using readers. RDS Proxy now provides a reader endpoint similar to the standard Aurora Cluster and can distribute requests for you. That said, you will need to actually have a separate connection to that endpoint, but it does provide distribution as opposed to pinned connections.
Automating the determination of whether a connection / session / transaction / query is RW or RO seems straightforward, but the complexity of things like function calls (like @davecramer mentioned) make this a bit harder to get right. I'm not sure where the right place to do this type of work is, but it certainly seems like the driver may be a good place to look.
The Aurora Read cluster endpoint does load-balancing for read traffic. However, with splitting read/writes we are left with the application to manage the datasource or connection against the master or slave. As far as the driver is concerned, the MariaDB connector/j for mysql implemented the aurora protocol to not only load balance read requests across read replicas but also read/write split based on connection/transaction attributes ( connection is set to read-only). Could we perhaps do something similar with the PostgreSQL driver?
I am currently working on the solution to split read and write but this has to be done within the application itself. The other nice solution I have tried is Heimdall but that is still a POC.
Out application is Spring Java-based and the split was easier to do by creating read and write data sources in JBoss and then enabling the app to use but the issue is the connection pool settings. Will update here in a few days, when I am done with this solution.
@davecramer AFAIK MySQL driver can decide if connection should use write or read node based on the readOnly
flag set on the JDBC connection. Sounds to me like same approach could work for Postgres driver..? Or there are some differences between MySQL and PostgreSQL related to this that I am not aware about? Or maybe MySQL design is flawed?
@maciejwalkowiak Interesting. Yes this would work, however I think that requires the application knows that it is requesting a read node or a write node. From what I can tell this connection property is really just selecting an endpoint ? I think notionally people want the driver to figure out based on queries if it is a read query or a write query, no?
If a transaction in the application is marked readOnly the application framework (such as Spring) should be setting the connection property to readOnly
dynamically for the transaction. Ideally, the driver would honor that and be able to route to the appropriate cluster endpoint (either read/write or ro cluster)? At least that's what mariadb did with the jdbc:mysql:aurora
option. Not sure if the AWS Mysql Driver with jdbc:mysql:aws
does the same.
cool that makes a lot of sense.. thanks for the clarification
For visibility, we are recommending everyone to move to our new repository here: https://github.com/awslabs/aws-advanced-jdbc-wrapper.
We have redesigned the driver as a JDBC Wrapper which has been tested to work with the PGJDBC driver. Please give it a look and we will be triaging issues in the new repository.
Read / Write splitting is not currently supported, but we are actively working on the implementation.
Thanks.
@hsuamz is the read/write split supported by the new driver?
@ghilainm yes, see https://github.com/awslabs/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheReadWriteSplittingPlugin.md
Please use https://github.com/awslabs/aws-advanced-jdbc-wrapper