charts
charts copied to clipboard
[bitnami/postgresql-ha] ERROR: cannot execute INSERT in a read-only transaction
Name and Version
bitnami/postgresql-ha 6.5.5
What architecture are you using?
None
What steps will reproduce the bug?
- Deploy PostgreSQL with HA architecture (using Replication Manager (repmgr) and Pgpool using bitnami helm charts Postgresql replicas=2 and Pgpool replicas=2
- C++ application connected to DB via Pgpool using the libpq library and executing SELECT, INSERT and Update queries.
- Occasionally, when a master PostgreSQL node switches to slave and slave node to master due to a restart or any other reason, Pgpool may continue to send all queries to the previous master, which is now acting as a slave. This results in the following error:
ERROR: cannot execute INSERT in a read-only transactionThis error occurs because Pgpool attempts to execute INSERT and UPDATE queries on a PostgreSQL instance that is in a read-only state.
Are you using any custom parameters or values?
----------------------------
pgPool-II configuration file
----------------------------
backend_clustering_mode = 'streaming_replication'
listen_addresses = '*'
port = '5432'
socket_dir = '/opt/build/pgpool/tmp'
reserved_connections = 0
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/opt/build/pgpool/tmp'
listen_backlog_multiplier = 2
serialize_accept = off
backend_hostname0 = 'host1'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'server0'
enable_pool_hba = 'on'
authentication_timeout = '30'
allow_clear_text_frontend_auth = 'off'
ssl = 'on'
ssl_key = '/opt/build/pgpool/certs/tls.key'
ssl_cert = '/opt/build/pgpool/certs/tls.crt'
ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
ssl_prefer_server_ciphers = off
ssl_ecdh_curve = 'prime256v1'
ssl_dh_params_file = ''
num_init_children = '200'
max_pool = '15'
child_life_time = 5min
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0
log_destination = 'stderr'
log_connections = 'off'
log_disconnections = off
log_hostname = 'on'
log_statement = off
log_per_node_statement = 'off'
log_client_messages = off
log_standby_delay = 'if_over_threshold'
syslog_facility = 'LOCAL0'
syslog_ident = 'pgpool'
pid_file_name = '/opt/build/pgpool/tmp/pgpool.pid'
logdir = '/opt/build/pgpool/logs'
connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'
replicate_select = off
insert_lock = off
lobj_lock_table = ''
replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off
load_balance_mode = 'off'
ignore_leading_white_space = on
read_only_function_list = ''
write_function_list = ''
primary_routing_query_pattern_list = ''
database_redirect_preference_list = ''
app_name_redirect_preference_list = ''
allow_sql_comments = off
disable_load_balance_on_write = 'transaction'
dml_adaptive_object_relationship_list= ''
statement_level_load_balance = 'off'
sr_check_period = '30'
sr_check_user = 'repmgr'
sr_check_database = 'postgres'
delay_threshold = 10000000
follow_primary_command = ''
health_check_period = '30'
health_check_timeout = '10'
health_check_user = 'repmgr'
health_check_database = ''
health_check_max_retries = '5'
health_check_retry_delay = '5'
connect_timeout = 10000
failover_command = 'echo ">>> Failover - that will initialize new primary node search!"'
failback_command = ''
failover_on_backend_error = 'off'
detach_false_primary = off
search_primary_node_timeout = '0'
recovery_user = 'nobody'
recovery_password = ''
recovery_1st_stage_command = ''
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0
auto_failback = off
auto_failback_interval = 1min
use_watchdog = off
trusted_servers = ''
ping_path = '/bin'
hostname0 = ''
wd_port0 = 9000
pgpool_port0 = 9999
wd_priority = 1
wd_authkey = ''
wd_ipc_socket_dir = '/tmp'
delegate_IP = ''
if_cmd_path = '/sbin'
if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev eth0 label eth0:0'
if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev eth0'
arping_path = '/usr/sbin'
arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'
clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_de_escalation_command = ''
failover_when_quorum_exists = on
failover_require_consensus = on
allow_multiple_failover_requests_from_node = off
enable_consensus_with_half_votes = off
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
heartbeat_hostname0 = ''
heartbeat_port0 = 9694
heartbeat_device0 = ''
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
wd_life_point = 3
wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''
relcache_expire = 0
relcache_size = 256
check_temp_table = catalog
check_unlogged_table = on
enable_shared_relcache = on
memory_cache_enabled = off
memqcache_method = 'shmem'
memqcache_memcached_host = 'localhost'
memqcache_memcached_port = 11211
memqcache_total_size = 64MB
memqcache_max_num_cache = 1000000
memqcache_expire = 0
memqcache_auto_cache_invalidation = on
memqcache_maxcache = 400kB
memqcache_cache_block_size = 1MB
memqcache_oiddir = '/var/log/pgpool/oiddir'
cache_safe_memqcache_table_list = ''
cache_unsafe_memqcache_table_list = ''
backend_hostname0 = 'postgresql-ha-postgresql-0.postgresql-ha-postgresql-headless.ns1.svc.cluster.local'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/opt/build/pgpool/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = 'postgresql-ha-postgresql-1.postgresql-ha-postgresql-headless.ns1.svc.cluster.local'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/opt/build/pgpool/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
What is the expected behavior?
Pgpool should consistently direct queries to the master node and promptly switch to the new master node whenever a change occurs.
What do you see instead?
Pgpool doesn't switch to the master node until we restart pgpool deployment.
-----------------------------
PostgreSQL configuration file
-----------------------------
listen_addresses = '*' port = '5432' max_connections = '512'
ssl = 'on' ssl_cert_file = '/certs/tls.crt' ssl_key_file = '/certs/tls.key'
wal_level = 'hot_standby' fsync = 'on'
max_wal_size = '400MB'
archive_mode = 'on' archive_command = '/bin/true'
max_wal_senders = '16' wal_keep_size = '128MB'
max_replication_slots = '10'
primary_conninfo = 'host=postgresql-ha-postgresql-1.postgresql-ha-postgresql-headless.ns1.svc.cluster.local port=5432 user=repmgr promote_trigger_file = '/tmp/postgresql.trigger.5432' hot_standby = 'on'
logging_collector = 'on'
log_directory = '/opt/build/postgresql/logs' log_filename = 'postgresql.log'
log_connections = 'false' log_disconnections = 'false' log_hostname = 'true'
client_min_messages = 'error'
shared_preload_libraries = 'repmgr, pgaudit, repmgr'
include_dir = 'conf.d'
pgaudit.log_catalog = 'off'
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
ping
Hi @rohitkgupta,
I'm sorry for the late response.
I may need more information to help you with this issue. Does your issue persist over time when the master/slave switch happens? Or is this issue transient?
Please notice that short downtimes may happen when master is switched because in case of failure, the chart behavior would be the following:
- Repmgr will notice the Master node is not available and select a new one.
- Pgpool-II health_check will detect the master node is not available and stop accepting new requests.
- Because of new request requests being refused, first the readiness probe will fail and pgpool-II will no longer receive traffic.
- Eventually, pgpool-II will be restarted by the liveness probe failure, connecting to the new master node.
If the error ERROR: cannot execute INSERT in a read-only transaction is indeed transient, I would recommend fine-tunning both the pgpool-II healthcheck settings and the chart liveness/readiness probe settings for quicker recovery from failure.
@migruiz4 Thanks for your response. It is not a transient issue. Pgpool didn't restart and there were no probe failure for many hours, we had to restart Pgpool manually in this case.
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
same issue for me with pgpool:4.5.1-debian-12-r0 downgrade to 4.5.0-debian-12-r10 to fix
Thank you very much for your feedback @BK-STAR77!
The container logic has not received any change from 4.5.0 to 4.5.1 that could cause this behavior. Checking pgpool release notes, it appears that it could be related to the change of the default value for failover_on_backend_shutdown.
Could you please confirm if setting failover_on_backend_shutdown=on fixes the issue? If that is the case, we should set it as the default value because that may be desired for Kubernetes.
I have same issue here and reverting manually to 4.5.0-debian-12-r10 in deployment is working now... but, this could be very dangerous for production...
Thank you very much for your feedback @BK-STAR77!
The container logic has not received any change from 4.5.0 to 4.5.1 that could cause this behavior. Checking pgpool release notes, it appears that it could be related to the change of the default value for failover_on_backend_shutdown.
Could you please confirm if setting
failover_on_backend_shutdown=onfixes the issue? If that is the case, we should set it as the default value because that may be desired for Kubernetes.
Is it a variable in the values.yaml ?
I don't see this variable in values.yaml :/
We are experiencing very bad failover issues also.
We updated to the newer charts from waaaaaay back and the latest charts appear to not failover correctly at all. We are using kubernetes.
We deploy then deploy our other pods to connect and all is fine.
Next we delete the primary node pod using kubectl delete pod
with a corresponding error in pgpool of the following
There have been other errors too that state it is in read-only mode
Caused by: org.postgresql.util.PSQLException: ERROR: cannot execute UPDATE in a read-only transaction
We have attempted to downgrad the image of pgpool to 4.5.0 while still using the current charts but pgpool will fail to start with the following error.
cp: cannot stat '/opt/bitnami/pgpool/etc/pgpool.conf.sample': No such file or directory
So currently, it appears the pg 16 chart version 14+ do not properly failover which makes HA useless and dangerous.
We were hoping to update since previous issues with pgpool and the chart caused pgpool to restart beacause of liveness probes and those updates to the probe script were added and we hoped all was fixed. There was also a previous issue where the DB would go into split brain pretty often and that was super dangerous because it wouldn't fail and you would have one outdated db and one current and it would read from both... so we had to shut off loadbalancing.
Edit:
After restarting PGPool they all EVENTUALLY came back up. Will test again to see if it works eventually WITHOUT restart and see if it replicates again.
I am also pouring through the other issues to see if any are related.
@Stevenpc3 Exactly the same thing is happening to me, but as I only started using postgresql-ha about a month ago I thought it was a problem with some bad configuration on my part. But I had this problem for weeks, and only by deleting the extra primary one did it start to work, because it recreated everything, but automatically it doesn't work. In my case I ended up having different random data and I noticed that I had two primary.
@apoca that issue you are describing is split-brain https://github.com/bitnami/charts/issues/20998 more on the internet about it too. The issue I mentioned at the bottom of my comment is more like replication stops working with no errors or issues detected and is super dangerous and makes failover pointless.
Both of those above issues are not related to the current issue in this thread though. This thread appears to be an issue with PGPool and/or the bitnami chart configs that use it.
You can add the flag
postgresql-ha:
pgpool:
configuration: |
connection_cache = false
failover_on_backend_error = on
failover_on_backend_shutdown = on
You can add the flag
postgresql-ha: pgpool: configuration: | connection_cache = false failover_on_backend_error = on failover_on_backend_shutdown = on
Is the "postgresql-ha:" name correct? My config file has the name as "postgresql:" - also, the pgpool: section is a top-level item in the values.yaml... Sorry if I'm missing something here - I've just started using this.
Is it not meant to be:
postgresql:
...
pgpool:
configuration: |-
connection_cache = false
failover_on_backend_error = on
failover_on_backend_shutdown = on
?
My apologies, your way is correct for using the chart directly. I am using a wrapper chart. We have a chart that uses this chart as a dependency so we can template a few things.
Since you talk about this, I have a question here... I have software as a service and I create a schema database and a user/password as well as the necessary permissions in real time... Sometimes I have a problem with Kind because saying that the database does not exist (but it exists) my question is whether this “false” cache solves this type of problem in pgpool?
same issue here vanilla install postgresql-ha-14.0.2 16.2.0
Hi there!
We have released a new version of the bitnami/pgpool container setting the default value for failover_on_backend_shutdown=on (https://github.com/bitnami/containers/pull/65526).
This patched image is included in the postgresql-ha Helm chart starting version 14.0.6. Could you please confirm if the issue persists after upgrading the chart to used the patched pgpool image?
@migruiz4 I am using the last version 14.0.10 and I had this error today, and after rollout again pgpool it's working again... This is very serious issue and I won't put this for production mode.
@apoca Could you please provide a detailed description of your issue? We need additional information in order to troubleshoot and fix this issue
We also tried setting these settings manually and could not get proper fail over to work. I will be doing more testing tomorrow of the charts and try to write a more detailed example of the setup, test, and results.
What is the fail over testing Bitnami has done to test this?
It would be nice for us to know and replicate the procedure used to validate the chart and operation to compare.
The basic test we tried and others seem to try is:
- Deploy the chart with 3 instances.
- Have a service connect through pgpool that does read and write operations.
- Delete the primary node pod. Expect a fail over and then the service should still be able to use the database with the new primary.
Instead the service can't because pgpool didn't fail over, or if it did, it thinks the new primary is read only.
Hi @rohitkgupta,
I'm sorry for the late response.
I may need more information to help you with this issue. Does your issue persist over time when the master/slave switch happens? Or is this issue transient?
Please notice that short downtimes may happen when master is switched because in case of failure, the chart behavior would be the following:
- Repmgr will notice the Master node is not available and select a new one.
- Pgpool-II health_check will detect the master node is not available and stop accepting new requests.
- Because of new request requests being refused, first the readiness probe will fail and pgpool-II will no longer receive traffic.
- Eventually, pgpool-II will be restarted by the liveness probe failure, connecting to the new master node.
If the error
ERROR: cannot execute INSERT in a read-only transactionis indeed transient, I would recommend fine-tunning both the pgpool-II healthcheck settings and the chart liveness/readiness probe settings for quicker recovery from failure.
From this description, would you state that you EXPECT pgpool to fail liveness probe and subsequently restart as part of a normal and proper fail over?
We have a case where we do not want pgpool to restart. I would hope it was possible for a fail over without restart of pgpool.
Hi @Stevenpc3, the tests we execute for postgresql-ha can be found here: https://github.com/bitnami/charts/tree/main/.vib/postgresql-ha.
From this description, would you state that you EXPECT pgpool to fail liveness probe and subsequently restart as part of a normal and proper fail over?
Yes, before upgrading to pgpool-II version 4.5.0 to 4.5.1, when using the bitnami/postgresql-ha chart with default values, short downtimes would be expected if the Postgresql primary node goes down.
Because requirements may vary from one user to another, we usually encourage users fine-tune pgpool-II settings, repmgr settings and liveness/readiness timers to reduce those downtimes adapted to their use case.
The bug we are addressing in this issue is the behavior OP reported, where on primary failure pgpool-II would never establish a connection to the primary node and all client connections would fail until a manual restart is performed, which is a much severe issue.
Yes I agree it's a severe issue, we have that issue too which is why I asked
"How did you test before publishing" and not see this? Was fail over part of the normal test?
As for the pgpool liveness probe, you added a full restart of pgpool as normal operation. This is off to me. Would you expect pgpool to be restarted in a bare metal deploy? Why is pgpool expected to restart in kubernetes?
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
bump
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.
I have created an internal task to fix this issue and added the on-hold label to prevent the stale bot while this is still relevant.
I'm sorry but we are a small team and have to deal with other priorities at the moment, if you would like to help us find a fix for this issue and contribute to the project, please send a pull request and we will be happy to review it.