postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

pg: cannot set transaction read write mode during recovery

Open lkgGitHub opened this issue 2 years ago • 9 comments

I have two pod in kubernetes, postgres-service-0 and postgres-service-1. First postgres-service-1 is master. After PostgreSQL master-slave failover, Application client connection PostgreSQL error:"pg: cannot set transaction read write mode during recovery". But PostgreSQL pods are all running. When I restart the application, it can be restored.

  • Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.8.2 spilo-14:2.1-p7

  • Where do you run it - cloud or metal? Kubernetes or OpenShift? Kubernetes

  • Are you running Postgres Operator in production? [yes | no] yes

postgres-service-1

2023-06-13 07:31:47,890 INFO: Lock owner: postgres-service-1; I am postgres-service-1
2023-06-13 07:31:52,898 ERROR: Request to server https://10.96.0.1:443 failed: ReadTimeoutError(\"HTTPSConnectionPool(host='10.96.0.1', port=443): Read timed out. (read timeout=4.9869120344519615)\",)
2023-06-13 07:31:53,858 WARNING: Concurrent update of postgres-service
2023-06-13 07:31:54,172 INFO: starting after demotion in progress
2023-06-13 07:31:54,174 INFO: Lock owner: postgres-service-0; I am postgres-service-1
2023-06-13 07:31:54,174 INFO: establishing a new patroni connection to the postgres cluster
2023-06-13 07:31:54,181 INFO: Local timeline=4 lsn=11/EC60FAD8
2023-06-13 07:31:54,212 INFO: master_timeline=5
2023-06-13 07:31:54,213 INFO: master: history=1\u00090/570000A0\u0009no recovery target specified
2\u00094/2248FB28\u0009no recovery target specified
3\u0009A/D3FE9300\u0009no recovery target specified
4\u000911/EC60FAD8\u0009no recovery target specified
server signaled
2023-06-13 07:31:54,323 INFO: no action. I am (postgres-service-1), a secondary, and following a leader (postgres-service-0)
2023-06-13 07:31:54,325 INFO: Lock owner: postgres-service-0; I am postgres-service-1
2023-06-13 07:31:54,330 INFO: Local timeline=4 lsn=11/EC60FAD8
2023-06-13 07:31:54,361 INFO: master_timeline=5
2023-06-13 07:31:54,362 INFO: master: history=1\u00090/570000A0\u0009no recovery target specified
2\u00094/2248FB28\u0009no recovery target specified
3\u0009A/D3FE9300\u0009no recovery target specified
4\u000911/EC60FAD8\u0009no recovery target specified
2023-06-13 07:31:54,372 INFO: no action. I am (postgres-service-1), a secondary, and following a leader (postgres-service-0)

postgres-service-0 log:

Got response from postgres-service-1 http://10.244.1.77:8008/patroni: {"state": "running", "postmaster_start_time": "2023-06-13 07:31:47.021588+00:00", "role": "replica", "server_version": 140005, "xlog": {"received_location": 76980222680, "replayed_location": 76980222680, "replayed_timestamp": "2023-06-13 07:31:41.062583+00:00", "paused": false}, "timeline": 4, "replication": [{"usename": "standby", "application_name": "postgres-service-0", "client_addr": "10.244.2.24", "state": "streaming", "sync_state": "async", "sync_priority": 0}], "dcs_last_seen": 1686641507, "database_system_identifier": "7215965247549096005", "patroni": {"version": "2.1.4", "scope": "postgres-service"}}
2023-06-13 07:30:43,392 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2023-06-13 07:30:43,522 INFO: promoted self to leader by acquiring session lock
server promoting
2023-06-13 07:30:43,549 INFO: cleared rewind state after becoming the leader
2023-06-13 07:30:43,524 INFO: Lock owner: postgres-service-0; I am postgres-service-0
2023-06-13 07:30:43,893 INFO: updated leader lock during promote

Some general remarks when posting a bug report:

  • Please, check the operator, pod (Patroni) and postgresql logs first. When copy-pasting many log lines please do it in a separate GitHub gist together with your Postgres CRD and configuration manifest.
  • If you feel this issue might be more related to the Spilo docker image or Patroni, consider opening issues in the respective repos.

lkgGitHub avatar Jun 14 '23 10:06 lkgGitHub

same problem

noahge avatar Jul 08 '23 13:07 noahge

Observed the exact same issue more than once on an instance, resolves if the read replica is restarted

gtejasvi avatar Jul 22 '23 03:07 gtejasvi

same problem

obitoquilt avatar Jan 18 '24 05:01 obitoquilt

same issue

meltingrock avatar Feb 24 '24 10:02 meltingrock

same here

danpe avatar Apr 22 '24 07:04 danpe

Still reproducible on latest operator v1.12.2. Any updates about this issue?

Danieloni1 avatar Jun 20 '24 14:06 Danieloni1

Same problem

Bohooslav avatar Aug 09 '24 09:08 Bohooslav

Same problem

ddh27 avatar Aug 13 '24 02:08 ddh27