pg_auto_failover icon indicating copy to clipboard operation
pg_auto_failover copied to clipboard

Dropping a database

Open AmodKakade12 opened this issue 1 year ago • 12 comments

Hi

In postgresql managed by pg_auto_failover when a database is attempted to drop, the statement never returns and the postgresql log starts getting flooded with messages reporting deadlocks as shown below.

postgres=# create database amod; CREATE DATABASE postgres=# drop database amod;

--> HANGS..

2025-01-24 07:52:24.845 GMT [954540]: [19-1]db=,user=,app=,client= LOG: 00000: started worker for pg_auto_failover health checks in "amod" 2025-01-24 07:52:24.845 GMT [954540]: [20-1]db=,user=,app=,client= LOCATION: HealthCheckWorkerLauncherMain, health_check_worker.c:347

2025-01-24 07:52:25.847 GMT [954591]: [1-1]db=,user=,app=,client= LOG: 00000: process 954591 still waiting for RowExclusiveLock on object 24610 of class 1262 of database 0 after 1000.140 ms 2025-01-24 07:52:25.847 GMT [954591]: [2-1]db=,user=,app=,client= DETAIL: Process holding the lock: 954584. Wait queue: 954591. 2025-01-24 07:52:25.847 GMT [954591]: [3-1]db=,user=,app=,client= LOCATION: ProcSleep, proc.c:1508 2025-01-24 07:52:29.862 GMT [954584]: [1-1]db=postgres,user=postgres,app=psql,client=[local] LOG: 00000: still waiting for backend with PID 954543 to accept ProcSignalBarrier 2025-01-24 07:52:29.862 GMT [954584]: [2-1]db=postgres,user=postgres,app=psql,client=[local] LOCATION: WaitForProcSignalBarrier, procsignal.c:421 2025-01-24 07:52:29.862 GMT [954584]: [3-1]db=postgres,user=postgres,app=psql,client=[local] STATEMENT: drop database amod (force);

Soon the sessions starts accumulating due to deadlock and it causes the database go inresponsive due to too many sessions error. Only way to resolve is to cancel the drop database statement by ctrl+c. This leads to database to be dropped go in invalid state and it is reported continuously in the postgresql log.

2025-01-28 07:47:34.975 GMT [1848700]: [1-1]db=amod,user=pmm,app=[unknown],client=10.8.99.136 FATAL: 55000: cannot connect to invalid database "amod" 2025-01-28 07:47:34.975 GMT [1848700]: [2-1]db=amod,user=pmm,app=[unknown],client=10.8.99.136 HINT: Use DROP DATABASE to drop invalid databases. 2025-01-28 07:47:34.975 GMT [1848700]: [3-1]db=amod,user=pmm,app=[unknown],client=10.8.99.136 LOCATION: InitPostgres, postinit.c:1117

Has anyone seen this behavior? How to circuvent this issue? Any ideas/known best methods sharing would help greatly.

Regards

Amod

AmodKakade12 avatar Jan 28 '25 09:01 AmodKakade12

Hi @AmodKakade12

I've often observed this behavior when logical replication is in use. Until I drop the subscriptions, creating or dropping a database tends to hang.

It's worth checking if logical replication is enabled or is not in use but still have inactive slots.

Another factor to consider is the checkpoint frequency.

haiderz07 avatar Mar 05 '25 12:03 haiderz07

thanks for reply. I dont have any logical replication setup. I could work around the problem by starting the database in single user mode and then drop the database

postgres@~]$ postgres --single postgresbackend> drop database XXX;[postgres@~]$ pg_ctl stop[postgres@***~]$ pg_ctl start

Regards

Amod

+91 98457 13545

On Wednesday, March 5, 2025 at 06:20:27 PM GMT+5:30, Haider Zahid ***@***.***> wrote:  

Hi @AmodKakade12

I've often observed this behavior when logical replication is in use. Until I drop the subscriptions, creating or dropping a database tends to hang.

It's worth checking if logical replication is enabled or is not in use but still have inactive slots.

Another factor to consider is the checkpoint frequency.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***> haiderzahid98 left a comment (hapostgres/pg_auto_failover#1063) Hi @AmodKakade12

I've often observed this behavior when logical replication is in use. Until I drop the subscriptions, creating or dropping a database tends to hang.

It's worth checking if logical replication is enabled or is not in use but still have inactive slots.

Another factor to consider is the checkpoint frequency.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

AmodKakade12 avatar Mar 06 '25 10:03 AmodKakade12

Hi,

@DimCitus @JelteF @mtuncer

Appreciate help from the experts on this limitation. Reopening for further discussion!

Regards

Amod

AmodKakade12 avatar Sep 01 '25 11:09 AmodKakade12

Hi,

@DimCitus @JelteF @mtuncer

Hi

Further update. I have found only following workaround to solve this

  1. Stop the systemd daemon for pg_auto_failover service on all replica & primary
  2. Start the primary in single user mode as follows $ postgres --single postgres
  3. drop the database using drop database ; command & exit
  4. Restart the systemd daemon for pg_auto_failover service on primary and then all replica.
  5. Database in question is dropped successfully.

Although this workaround is avaialble it would be good if there is a single pg_auto_failover command or defined process to carry out this operation.

regards

Amod

AmodKakade12 avatar Sep 01 '25 11:09 AmodKakade12

There must be something configured weirdly, because usually you can drop databases just fine with pg_auto_failover running.

s4ke avatar Sep 02 '25 13:09 s4ke

Can you share your setup here maybe, then someone could help possibly.

s4ke avatar Sep 02 '25 13:09 s4ke

Thanks for reply. I feel the PGAF healthcheck user keeps connected to the database which causes above issue. I can see following messages appearing on the DB log

Is there any way to halt the healthcheck user someway to complete such one time maintenance jobs?

DB log

       2025-09-03 10:39:10.227 BST [3038132]: [261-1]db=postgres,user=postgres,app=psql,client=[local] STATEMENT:  drop database amod;
       2025-09-03 10:39:15.232 BST [3038132]: [262-1]db=postgres,user=postgres,app=psql,client=[local] LOG:  00000: still waiting for backend with PID 3037451 to accept ProcSignalBarrier
       2025-09-03 10:39:15.232 BST [3038132]: [263-1]db=postgres,user=postgres,app=psql,client=[local] LOCATION:  WaitForProcSignalBarrier, procsignal.c:415


      [postgres@log]$ ps -ef|grep 3037451
      postgres 3037451 2184305  0 10:31 ?        00:00:00 postgres: pg_auto_failover monitor healthcheck worker postgres
      postgres 3039827 3034152  0 10:40 pts/4    00:00:00 grep --color=auto 3037451
      [postgres@log]$


      postgres=# SELECT * FROM pg_stat_activity WHERE pid =3037451;
      datid | datname  |   pid   | leader_pid | usesysid | usename  |           application_name           | client_addr | client_hostname                                              | client_port |         backend_start         | xact_start
      | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query_id | query |            backend_type
      -------+----------+---------+------------+----------+----------+--------------------------------------+-------------+-----------          ------+-------------+-------------------------------+------------
      +-------------+--------------+-----------------+------------+-------+-------------+--------------+----------+-------+----------------
      5 | postgres | 3037451 |            |       10 | postgres | pg_auto_failover health check worker |             |                 |             |           2025-09-03 10:31:02.925617+01 |
      |             |              | Client          | ClientRead |       |             |              |          |       | pgautofailover
      (1 row)


      postgres=#  select pid,
      usename,
      pg_blocking_pids(pid) as blocked_by,
      query as blocked_query
      from pg_stat_activity where  query like '%drop%';
         pid   | usename  | blocked_by |                   blocked_query
      ---------+----------+------------+---------------------------------------------------
      3038132 | postgres | {}         | drop database amod;
      3039906 | postgres | {}         | select pid,                                      +
      2 rows)
      postgres=#

AmodKakade12 avatar Sep 03 '25 09:09 AmodKakade12

Further I have used following commands to create the monitor and database

Monitor yum -y install pg_auto_failover_17 pg_autoctl create monitor --pgctl /usr/pgsql-17/bin/pg_ctl --pgdata <PGDATA home> --hostname --ssl-self-signed --ssl-mode require --pgport <PGPORT> --auth scram-sha-256 pg_autoctl run

Database yum -y install pg_auto_failover_17 pg_autoctl create postgres --pgdata <PGDATA home> --name n1 --auth scram-sha-256 --candidate-priority 80 --replication-quorum true --ssl-self-signed require --username postgres --dbname postgres --pgport <PGPORT> --pghost <PrimaryHostName> --hostname <PrimaryHostName> --pgctl /usr/pgsql-17/bin --maximum-backup-rate 100G pg_autoctl run

AmodKakade12 avatar Sep 03 '25 09:09 AmodKakade12

Does inserting data (and committing) work against a table that you create in the database before?

I had similar behaviours when communication between the worker nodes was not healthy.

s4ke avatar Sep 11 '25 07:09 s4ke

Does inserting data (and committing) work against a table that you create in the database before?

I had similar behaviours when communication between the worker nodes was not healthy.

I was able to perforrm all DML against tables in the database as usual. Problem comes only when the drop database command is issued.

AmodKakade12 avatar Sep 11 '25 08:09 AmodKakade12

@dimitri

Appreciate if users of this great tool can provide their experience on this. This looks like a limitation to the software and requires outage to fix. It would be great if the developers can include the fix/solution to this issue

AmodKakade12 avatar Sep 18 '25 10:09 AmodKakade12

It seems that dropping database is not an issue anymore as I tested in postgres 18.1 + pg_auto_failover 2.2 on rocky linux 10. But If you drop your database set in .local config and then shutdown your pgautofailover service, and you will find service can't be brought online again. You could solove it through "pg_ctl start" to initiate a local instance, and then create the database.

stevenchang1213 avatar Nov 26 '25 07:11 stevenchang1213