postgresql_cluster
postgresql_cluster copied to clipboard
HAProxy + pgbouncer configuration and pgpool-II as an alternative
I have followed the most common scenario for deploying a cluster recently with HAProxy and pgbouncer. After maintaining this setup for a while I came up with understanding of the following downsides:
- pgbouncer is deployed to each cluster node which increases complexity of the setup and maintenance;
- HAProxy has no awareness of database specifics and therefore configured to handle write/read operations through different ports. This means that load-balancing have to be handled by the client application;
- Current confd template lists all database nodes for each frontend, relying on patroni health check. This means that some of the backends are always be down (e.g. primary node for read-only frontend).
The last point can be mitigated with a more involved template which also check for node role before including it in a particular frontend. @vitabaks please let me know if you'd be open to a contribution regarding this.
While researching I mistakingly assumed that pgpool-II was a functional equivalent of pgbouncer and the latter was chosen due to bigger popularity. Since then I have found that pgpool-II is actually so much more capable and can act as a load-balancer among other things. For now I'm testing it as a replacement for both HAProxy and pgbouncer and keep learning its other features like query caching.
@rcknr Thanks for the comment.
I think we could implement the option to install pgbouncer on separate nodes, but by default, we install it locally and use a Unix socket to connect the pool to the database, as it is a more efficient version of the scheme.
P.S. We are not considering pgpool-II and it will most likely not be added to the main project, but you can experiment with your installation because postgresql_cluster provides flexibility in choosing a scheme.
For confd template I have the following proposal:
{{- range $members := gets "/members/*" -}}
{{- $data := json $members.Value -}}
{{- $hostname := index (split $members.Key "/") 2 -}}
{{- $host := base (replace (index (split (index (split $data.conn_url ":") 1) "/") 2) "@" "/" -1) -}}
{{- $port := index (split (index (split $data.api_url ":") 2) "/") 0 -}}
{{- if eq $data.role "master" }}
server {{ $hostname }} {{ $host }}:6432 check port {{ $port }}
{{ end -}}
{{- end }}
This section refers to write frontend. For read frontend "master" is replaced with "replica". This avoids listing replicas in write frontend and masters in read frontend respectively.
this is not necessary because HAProxy sends checks to the Patroni REST API and directs traffic only to servers with the required role.
Yes, I understand that. However, if you use any monitoring on HAProxy it will complain that some backends are down and you are going to need to get around that. One could argue that confd is not really necessary as you could just list all nodes in HAProxy with Ansible and fully rely on patroni healthchecks.
I monitor the availability of port 5000 and 5001, and not separately each host that is defined in the HAProxy configuration.
According to the same principle (using the Patroni REST API) I am implementing cloud load balancers (will be available in version 2.0).
HAProxy though checks each backend listed and appropriately marks some as being down under its metrics. Again, in this scenario confd isn't doing much and could be taken from the equation without compromising reliability. However, if you account for the role in the template it suddenly adds another layer of accountability with DCS.
If you're interested in this feature, please consider becoming a sponsor.
The development of this feature requires sponsorship to fund developer efforts. If you're already a sponsor and are interested in this feature, please leave a comment here so we can prioritize this issue accordingly.
We also welcome contributions from those willing to dedicate their time to implement this feature on a voluntary basis. Please see the Contributing Guide for more information.
- HAProxy has no awareness of database specifics and therefore configured to handle write/read operations through different ports. This means that load-balancing have to be handled by the client application;
You can easily make HAProxy aware of database specifics (which replica to route master requests) https://github.com/joe-at-startupmedia/happac/blob/743946594976bafc5cca752a71c8b6d4ef07da80/README.md?plain=1#L90