docker-pgautoupgrade icon indicating copy to clipboard operation
docker-pgautoupgrade copied to clipboard

database user "root" is not the install user

Open Laptop765 opened this issue 8 months ago • 7 comments

Hi there, thanks for this project, I've always put off doing upgrades because they're so annoying 😅

I switched my server from the postgres:16.8-alpine image to pgautoupgrade/pgautoupgrade:16-alpine which was a seamless in-place swap, I verified I could connect and all of my data was there. I then switched to pgautoupgrade/pgautoupgrade:17-alpine@sha256:05d6c68c12183308f9780f2508ea7bd29a9316fc236453418ad8133be1b93fe3 to upgrade to 17.x.

Moving to the old directory seemed fine:

Moving existing data files into OLD temporary directory is complete

The initdb arguments seemed odd but correct:

------------------------------------
Determining our own initdb arguments
------------------------------------
2025-05-08 17:28:39.921 UTC [41] LOG:  database system was interrupted; last known up at 2025-05-08 17:24:19 UTC
2025-05-08 17:28:42.275 UTC [41] LOG:  database system was not properly shut down; automatic recovery in progress
2025-05-08 17:28:42.306 UTC [41] LOG:  redo starts at 0/D566BF88
2025-05-08 17:28:42.306 UTC [41] LOG:  invalid record length at 0/D566BFC0: expected at least 24, got 0
2025-05-08 17:28:42.306 UTC [41] LOG:  redo done at 0/D566BF88 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-05-08 17:28:42.347 UTC [41] LOG:  checkpoint starting: end-of-recovery immediate wait
2025-05-08 17:28:42.376 UTC [41] LOG:  checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.010 s, total=0.031 s; sync files=3, longest=0.004 s, average=0.004 s; distance=0 kB, estimate=0 kB; lsn=0/D566BFC0, redo lsn=0/D566BFC0
2025-05-08 17:28:42.415 UTC [41] LOG:  checkpoint starting: shutdown immediate
2025-05-08 17:28:42.431 UTC [41] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.020 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB; lsn=0/D566C050, redo lsn=0/D566C050
---------------------------------------------------------------
The initdb arguments we determined are: --encoding=UTF8
---------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------
Old database using collation settings: '--encoding=UTF8'.  Initialising new database with those settings too
--------------------------------------------------------------------------------------------------------------------

The new directory created without issue:

Initialising PostgreSQL 17 data directory
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data/new ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 100
selecting default "shared_buffers" ... 128MB
selecting default time zone ... America/Los_Angeles
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok
Success. You can now start the database server using:
    /usr/local/bin/pg_ctl -D /var/lib/postgresql/data/new/ -l logfile start

But then the upgrade failed:

---------------------------------------
Running pg_upgrade command, from /var/lib/postgresql/data
---------------------------------------
Performing Consistency Checks
-----------------------------
Checking cluster versions                                     ok
Checking database user is the install user                    
database user "root" is not the install user
Failure, exiting
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

initdb: error: directory "/var/lib/postgresql/data" exists but is not empty
initdb: hint: If you want to create a new database system, either remove or empty the directory "/var/lib/postgresql/data" or run initdb with an argument other than "/var/lib/postgresql/data".

I'm pretty sure the database user "root" is not the install user is the actual cause but I'm not sure what it means or how to go about fixing it.

Laptop765 avatar May 08 '25 17:05 Laptop765

Hi @Laptop765

did you customize the username of your Postgres super user (POSTGRES_USER variable) or do you run the container with a different user than root? and maybe you could post your compose / docker run / Kubernetes spec prior to the switch to pgautoupgrade in redacted form so we have some more context.

andyundso avatar May 08 '25 20:05 andyundso

Ah I did customize the name of POSTGRES_USER to root:

  postgres:
    container_name: db_postgres
    environment:
      TZ: ${TZ}
      PGDATA: /var/lib/postgresql/data
      POSTGRES_PASSWORD: ${DEFAULT_PASSWORD}
      POSTGRES_USER: root
    image: postgres:16.8-alpine@sha256:3b057e1c2c6dfee60a30950096f3fab33be141dbb0fdd7af3d477083de94166c
    labels:
      traefik.enable: false
      wud.display.icon: sh:postgresql
      wud.tag.include: ^\d+\.\d+-alpine$$
      wud.link.template: https://www.postgresql.org/docs/release/$${major}.$${minor}
      wud.watch: false
    networks:
      - proxy
    restart: always
    volumes:
      - /mnt/user/appdata/databases/postgres:/var/lib/postgresql/data

Laptop765 avatar May 08 '25 20:05 Laptop765

and did you keep it in environment when switching to pgautoupgrade?

andyundso avatar May 08 '25 21:05 andyundso

and did you keep it in environment when switching to pgautoupgrade?

Yes, all I changed was the image name:

postgres:
    container_name: db_postgres
    environment:
      TZ: ${TZ}
      PGDATA: /var/lib/postgresql/data
      POSTGRES_PASSWORD: ${DEFAULT_PASSWORD}
      POSTGRES_USER: root
    image: pgautoupgrade/pgautoupgrade:17-alpine@sha256:05d6c68c12183308f9780f2508ea7bd29a9316fc236453418ad8133be1b93fe3
    labels:
      traefik.enable: false
      wud.display.icon: sh:postgresql
      wud.tag.include: ^\d+\.\d+-alpine$$
      wud.link.template: https://www.postgresql.org/docs/release/$${major}.$${minor}
      wud.watch: false
    networks:
      - proxy
    restart: always
    volumes:
      - /mnt/user/appdata/databases/postgres:/var/lib/postgresql/data

Laptop765 avatar May 08 '25 22:05 Laptop765

@Laptop765 what are the permissions set on /mnt/user/appdata/databases/postgres. I am trying to figure out if these are owned by root or the postgres user (UID usually 999).

andyundso avatar May 18 '25 12:05 andyundso

@Laptop765 what are the permissions set on /mnt/user/appdata/databases/postgres. I am trying to figure out if these are owned by root or the postgres user (UID usually 999).

This is an Unraid server, most things run as nobody:users which is 999:1000.

$ /mnt/user/appdata/databases/postgres# ls -la
total 2123
drwx------ 1   70 users      27 May  8 11:41 ./
drwxrwxr-x 1 root users      10 May  8 10:37 ../
-rw------- 1   70    70       3 Jan  3  2024 PG_VERSION
-rw-r--r-- 1   70    70 4127792 Jan  3  2024 backupfile.sql
drwx------ 1   70    70      10 Jan 19 15:25 base/
drwx------ 1   70    70      67 May 14 11:20 global/
drwx------ 1   70    70       2 Jan  3  2024 pg_commit_ts/
drwx------ 1   70    70       2 Jan  3  2024 pg_dynshmem/
-rw------- 1   70    70    5742 Jan  3  2024 pg_hba.conf
-rw------- 1   70    70    2640 Jan  3  2024 pg_ident.conf
drwx------ 1   70    70       5 May 25 18:07 pg_logical/
drwx------ 1   70    70       4 Jan  3  2024 pg_multixact/
drwx------ 1   70    70       2 Jan  3  2024 pg_notify/
drwx------ 1   70    70       2 Jan  3  2024 pg_replslot/
drwx------ 1   70    70       2 Jan  3  2024 pg_serial/
drwx------ 1   70    70       2 Jan  3  2024 pg_snapshots/
drwx------ 1   70    70       2 May  8 10:41 pg_stat/
drwx------ 1   70    70       2 Jan  3  2024 pg_stat_tmp/
drwx------ 1   70    70       3 Jan 24 07:00 pg_subtrans/
drwx------ 1   70    70       2 Jan  3  2024 pg_tblspc/
drwx------ 1   70    70       2 Jan  3  2024 pg_twophase/
drwx------ 1   70    70       5 May 24 04:22 pg_wal/
drwx------ 1   70    70       3 Jan  3  2024 pg_xact/
-rw------- 1   70    70      88 Jan  3  2024 postgresql.auto.conf
-rw------- 1   70    70   29621 Jan  3  2024 postgresql.conf
-rw------- 1   70    70      24 May  8 11:41 postmaster.opts
-rw------- 1   70    70      94 May  8 11:41 postmaster.pid

70:70 is a bit puzzling. I did an exec into the container and ran id and it seems to be running as root.

$ id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),20(dialout),26(tape),27(video)

There does seem to be a postgres user in /etc/passwd: postgres:x:70:70::/var/lib/postgresql:/bin/sh. It seems that user is what's actually running the process:

$ ps
PID   USER     TIME  COMMAND
    1 root      0:00 bash /usr/local/bin/docker-entrypoint.sh postgres
    9 postgres  2:59 postgres
   31 postgres  0:13 postgres: checkpointer
   32 postgres  0:05 postgres: background writer
   34 postgres  0:11 postgres: walwriter
   35 postgres  0:15 postgres: autovacuum launcher
   36 postgres  0:00 postgres: logical replication launcher
 8309 postgres  0:00 postgres: paperless paperless 172.19.0.22(33302) idle
 8311 postgres  0:00 postgres: paperless paperless 172.19.0.22(33312) idle
 8312 postgres  0:00 postgres: paperless paperless 172.19.0.22(33320) idle
11739 root      0:00 bash
11855 postgres  0:00 postgres: root postgres 172.19.0.7(44948) idle
11866 postgres  0:00 postgres: root immich 172.19.0.7(44956) idle
11910 root      0:00 ps
12510 postgres  0:32 postgres: mealie mealie 172.19.0.28(58972) idle
31305 postgres  0:40 postgres: mealie mealie 172.19.0.28(38908) idle

To summarize: postgres is the user running the server binary and root is the login for the server.

Laptop765 avatar May 26 '25 01:05 Laptop765

Hi @Laptop765

sorry for the late reply.

So I tried a couple of variations and actually never got the error you have:

  • If I initialize a Postgres container with POSTGRES_USER=root from the beginning, the upgrade works.
  • If I initialize a Postgres container with the default, and then change the user for the upgrade, it fails at a different point:
postgres-1  | Determining our own initdb arguments
postgres-1  | ------------------------------------
postgres-1  | 2025-06-15 09:52:49.969 UTC [41] FATAL:  database "root" does not exist

btw even if I pass POSTGRES_USER=root, the files are owned by the postgres user (UID 70). POSTGRES_USER only influences the name of the superuser created in the PG cluster. 999 is the Bitnami user, and I mixed those two variants in my head.

I am wondering: does POSTGRES_USER maybe miss superuser permissions? You can check this with \du+:

# \du+
                                    List of roles
 Role name |                         Attributes                         | Description
-----------+------------------------------------------------------------+-------------
 root      | Superuser, Create role, Create DB, Replication, Bypass RLS |

andyundso avatar Jun 15 '25 10:06 andyundso