pg_auto_failover icon indicating copy to clipboard operation
pg_auto_failover copied to clipboard

Init monitor failed

Open wxmeng04 opened this issue 1 year ago • 4 comments

Platform: Debian bookwork version: postgresql-17-auto-failover comand line prompt:

su - postgres -c "pg_autoctl create monitor --pgdata /var/lib/pgsql/17/data --pgport 5432 --hostname pgsql.local --auth trust --ssl-self-signed"
12:39:26 3320610 INFO  Using default --ssl-mode "require"
12:39:26 3320610 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
12:39:26 3320610 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
12:39:26 3320610 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
12:39:26 3320610 INFO  Initialising a PostgreSQL cluster at "/var/lib/pgsql/17/data"
12:39:26 3320610 INFO  /usr/lib/postgresql/17/bin/pg_ctl initdb -s -D /var/lib/pgsql/17/data --option '--auth=trust'
12:39:26 3320610 ERROR pg_ctl: too many command-line arguments (first is "--silent")
12:39:26 3320610 ERROR Try "pg_ctl --help" for more information.
12:39:26 3320610 FATAL Failed to initialize Postgres cluster at "/var/lib/pgsql/17/data", see above for details
12:39:26 3320610 FATAL Failed to initialize a PostgreSQL instance at "/var/lib/pgsql/17/data", see above for details

wxmeng04 avatar Nov 13 '24 04:11 wxmeng04

helo @wxmeng04,

I am getting exactly the same error on Debian bookwork postgresql-17-auto-failover.

I downgraded to postgresql-16-auto-failover and it's fine.

Did you find a workaround for this error?

cassioseffrin avatar Dec 19 '24 01:12 cassioseffrin

@wxmeng04, I wanted to provide some clarity regarding the compatibility of pg_autoctl with PostgreSQL 17.

Here’s the current setup and version details from my environment:

root@pg17ha:~# pg_autoctl --version
pg_autoctl version 2.1-3.pgdg120+1
pg_autoctl extension version 2.1
compiled with PostgreSQL 17rc1 (Debian 17~rc1-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
compatible with Postgres 11, 12, 13, 14, 15, and 16

root@pg17ha:~# dpkg -l | grep failover
ii  pg-auto-failover-cli             2.1-3.pgdg120+1                amd64        Command line interface and service to manage pg auto failover Clusters
ii  postgresql-17-auto-failover      2.1-3.pgdg120+1                amd64        Postgres high availability support

Based on this, it appears that the current version (pg_autoctl 2.1-3.pgdg120+1) is not yet compatible with PostgreSQL 17.

Could someone from the development team provide insight into:

Whether there are plans to add compatibility for PostgreSQL 17 in an upcoming release? If so, what is the estimated timeline for this? Your response will be greatly appreciated, as this information is critical for planning upgrades. Thank you in advance for your support!

cassioseffrin avatar Dec 19 '24 11:12 cassioseffrin

I'm struggling with this error as well, but I really do not understand what the problem is. I ran an strace:

strace -f -s99999 -e trace=clone,execve pg_autoctl create monitor --auth md5 --ssl-self-signed

and got the following result:

execve("/usr/bin/pg_autoctl", ["pg_autoctl", "create", "monitor", "--auth", "md5", "--ssl-self-signed"], 0x7ffe4479cd08 /* 18 vars */) = 0
16:27:29 582 INFO  Using default --ssl-mode "require"
16:27:29 582 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
16:27:29 582 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
16:27:29 582 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x71851a840c50) = 583
strace: Process 583 attached
[pid   583] execve("/usr/lib/postgresql/17/bin/pg_ctl", ["/usr/lib/postgresql/17/bin/pg_ctl", "--version"], 0x5bc47ffe4190 /* 20 vars */) = 0
[pid   583] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=583, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
16:27:29 582 INFO  Initialising a PostgreSQL cluster at "/var/lib/postgresql/data"
16:27:29 582 INFO  /usr/lib/postgresql/17/bin/pg_ctl initdb -s -D /var/lib/postgresql/data --option '--auth=trust'
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x71851a840c50) = 584
strace: Process 584 attached
[pid   584] execve("/usr/lib/postgresql/17/bin/pg_ctl", ["/usr/lib/postgresql/17/bin/pg_ctl", "initdb", "--silent", "--pgdata", "/var/lib/postgresql/data", "--option", "'--auth=trust'"], 0x5bc47ffe4190 /* 20 vars */) = 0
[pid   584] +++ exited with 1 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=584, si_uid=1000, si_status=1, si_utime=0, si_stime=0} ---
16:27:29 582 ERROR pg_ctl: too many command-line arguments (first is "--silent")
16:27:29 582 ERROR Try "pg_ctl --help" for more information.
16:27:29 582 FATAL Failed to initialize Postgres cluster at "/var/lib/postgresql/data", see above for details
16:27:29 582 FATAL Failed to initialize a PostgreSQL instance at "/var/lib/postgresql/data", see above for details
+++ exited with 3 +++

The key lines are:

[pid   584] execve("/usr/lib/postgresql/17/bin/pg_ctl", ["/usr/lib/postgresql/17/bin/pg_ctl", "initdb", "--silent", "--pgdata", "/var/lib/postgresql/data", "--option", "'--auth=trust'"], 0x5bc47ffe4190 /* 20 vars */) = 0
[pid   584] +++ exited with 1 +++

But oddly enough, when I run that command exactly, it works fine!

$ /usr/lib/postgresql/17/bin/pg_ctl.real initdb --silent --pgdata /var/lib/postgresql/data --option '--auth=trust'
$ echo $?
0

I even tried double-quoting the option, just in case that had something to do with it:

$ /usr/lib/postgresql/17/bin/pg_ctl.real initdb --silent --pgdata /var/lib/postgresql/data --option "'--auth=trust'" && echo good
good

I did see the same issue in the CLI output:

$ pg_autoctl --version
pg_autoctl version 2.1-3.pgdg120+1
pg_autoctl extension version 2.1
compiled with PostgreSQL 17rc1 (Debian 17~rc1-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
compatible with Postgres 11, 12, 13, 14, 15, and 16

I'm using the postgresql-17-auto-failover PGDG package:

Package: postgresql-17-auto-failover
Version: 2.1-3.pgdg120+1
Priority: optional
Section: database
Source: pg-auto-failover
Maintainer: Dimitri Fontaine <[email protected]>
Installed-Size: 1,199 kB
Depends: pg-auto-failover-cli (>= 2.1-3.pgdg120+1), postgresql-17, postgresql-17-jit-llvm (>= 16), libc6 (>= 2.4), libpq5 (>= 8.4~)
Homepage: https://github.com/citusdata/pg_auto_failover
Download-Size: 412 kB
APT-Manual-Installed: yes
APT-Sources: http://apt.postgresql.org/pub/repos/apt bookworm-pgdg/main amd64 Packages
Description: Postgres high availability support
 This extension implements a set of functions to provide High Availability to
 Postgres.

In Debian bookworm:

deb http://apt.postgresql.org/pub/repos/apt bookworm-pgdg main 17

lachesis avatar Feb 05 '25 16:02 lachesis

It looks like this commit from 4 days ago may be related? https://github.com/hapostgres/pg_auto_failover/commit/5b362980e7ca3a266d2d8922018aace717752df6

Also this PR #1061 and this issue #1048. It seems that release discussion is happening in #1069.

lachesis avatar Feb 05 '25 16:02 lachesis