testcontainers-rs-modules-community icon indicating copy to clipboard operation
testcontainers-rs-modules-community copied to clipboard

Postgres container with named volume mount times out on readiness check

Open cbrevik opened this issue 2 months ago • 10 comments

Describe the bug

Ref. comment here: https://github.com/testcontainers/testcontainers-rs/issues/742#issuecomment-3401370787

If I use a named volume mount with the TestContainers Postgres-module it works fine the first time the volume is created. But if new containers try to use the same volume later, the readiness check eventually times out and prints the following:

Error: WaitContainer(StartupTimeout)

This is weird, since if I check the logs from the Docker-container itself, database system is ready to accept connections is being printed (which the module checks for). And interacting with the container itself, and the database, it seems like it is ready.

To Reproduce

Code example:

Postgres::default()
        .with_mount(Mount::volume_mount("named-psql-volume", "/var/lib/postgresql/data"))
        .start()
        .await?;

I have created a repro here has well: https://github.com/cbrevik/named_mount/blob/main/src/main.rs

Steps:

  1. Run application with cargo run
  2. Stop the application (or in the repro above, it just exits right afterwards)
  3. Run it again with cargo run
  4. Wait a minute or so, the console prints `Error: WaitContainer(StartupTimeout)``

If I delete the named volume, e.g. named-psql-volume above here, then the application will start again (once).

Expected behavior

The container readiness check should pass, since it is ready. And I'd like to be able to re-use the same volume mount across different runs.

cbrevik avatar Oct 16 '25 13:10 cbrevik

@cbrevik potentially dumb question but the check you linked actually has the container checking for database system is ready to accept connections twice. In containers that are reusing an existing data volume, is that log line printed once, or twice?

the-wondersmith avatar Oct 17 '25 13:10 the-wondersmith

@cbrevik Ah ha, I think I may have solved the mystery. Look at the difference in the logs between a container that was started "clean" (i.e. the container that initially populates the named volume) vs. a container that was started with the named volume already populated

Clean Start
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
sh: locale: not found
2025-10-17 14:07:49.286 UTC [30] WARNING:  no usable system locales were found
performing post-bootstrap initialization ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/data -l logfile start

waiting for server to start....2025-10-17 14:07:50.099 UTC [36] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-10-17 14:07:50.103 UTC [37] LOG:  database system was shut down at 2025-10-17 14:07:49 UTC
2025-10-17 14:07:50.105 UTC [36] LOG:  database system is ready to accept connections
 done
server started

/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*

waiting for server to shut down....2025-10-17 14:07:50.201 UTC [36] LOG:  received fast shutdown request
2025-10-17 14:07:50.201 UTC [36] LOG:  aborting any active transactions
2025-10-17 14:07:50.202 UTC [36] LOG:  background worker "logical replication launcher" (PID 43) exited with exit code 1
2025-10-17 14:07:50.202 UTC [38] LOG:  shutting down
2025-10-17 14:07:50.204 UTC [36] LOG:  database system is shut down
 done
server stopped

PostgreSQL init process complete; ready for start up.

2025-10-17 14:07:50.310 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2025-10-17 14:07:50.310 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2025-10-17 14:07:50.310 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-10-17 14:07:50.313 UTC [49] LOG:  database system was shut down at 2025-10-17 14:07:50 UTC
2025-10-17 14:07:50.315 UTC [1] LOG:  database system is ready to accept connections
Reused Volume
PostgreSQL Database directory appears to contain a database; Skipping initialization

2025-10-17 14:08:55.354 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2025-10-17 14:08:55.354 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2025-10-17 14:08:55.354 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-10-17 14:08:55.357 UTC [22] LOG:  database system was interrupted; last known up at 2025-10-17 14:07:50 UTC
2025-10-17 14:08:55.357 UTC [22] LOG:  database system was not properly shut down; automatic recovery in progress
2025-10-17 14:08:55.358 UTC [22] LOG:  redo starts at 0/1668E38
2025-10-17 14:08:55.358 UTC [22] LOG:  invalid record length at 0/1668E70: wanted 24, got 0
2025-10-17 14:08:55.358 UTC [22] LOG:  redo done at 0/1668E38
2025-10-17 14:08:55.360 UTC [1] LOG:  database system is ready to accept connections

The "reused" container is failing from testcontainer's POV because it's expecting two database system is ready to accept connections logs, and the "reused" container only ever logs that out once.

the-wondersmith avatar Oct 17 '25 14:10 the-wondersmith

Yeah that makes sense, I noticed one was on_stderr and one was on_stdout, so without looking too much into it I thought it waited for a print on either err, or out:

WaitFor::message_on_stderr("database system is ready to accept connections"),
WaitFor::message_on_stdout("database system is ready to accept connections"),

But if it is a both, then I suppose that explains it. Then the question is, why both 🤔

cbrevik avatar Oct 17 '25 16:10 cbrevik

I guess this explains it: https://github.com/testcontainers/testcontainers-rs-modules-community/issues/158#issuecomment-2212510436

If either stderr or stdout is always the "second" (or in case of already mounted volume, the "first"), then maybe it would be sufficient to check for one of those. Not both?

I see the change was to go from checking just stderr to also stdout, so would it be sufficient to check just stdout? https://github.com/testcontainers/testcontainers-rs-modules-community/pull/162/files

Edit: I tested it out some locally, and it does not seem to be sufficient. stderr is probably always first then.

cbrevik avatar Oct 17 '25 16:10 cbrevik

Seems like listening on IPv4 address "0.0.0.0", port 5432 is being printed in both cases. So I got it to work locally for both cases if I changed the ready_conditions to:

    fn ready_conditions(&self) -> Vec<WaitFor> {
        vec![
            WaitFor::message_on_either_std("listening on IPv4 address \"0.0.0.0\", port 5432"),
            WaitFor::message_on_either_std("database system is ready to accept connections"),
        ]
    }

cbrevik avatar Oct 17 '25 17:10 cbrevik

@cbrevik are we sure that's actually a guaranteed log? It (probably) is as long as there's no way for users to override the listening address, but if there is then the log will change accordingly 🤔

the-wondersmith avatar Oct 17 '25 19:10 the-wondersmith

I'd suggest to override ready-conditions for your case with with_ready_conditions

As it's kinda depends on the use-case and current ready conditions were derived from other languages, any other ones were leading to flakiness. We also should avoid any hardcoded ports or hosts when possible in the conditions

We can document the recommendation about ready conditions if named volume is used, OR we can actually extend Postgres image to have something like: with_named_volume(...) (just an example) - that modifies ready conditions of it when user sets the named volume

DDtKey avatar Oct 17 '25 19:10 DDtKey

On the off chance we're voting on it, I'll register my vote for extending the Postgres image to have something like with_named_volume(...), FWIW

the-wondersmith avatar Oct 17 '25 20:10 the-wondersmith

Well, it seems like it also works with the truncated message for "listening". So this would work even if it was possible to override listening address:

    fn ready_conditions(&self) -> Vec<WaitFor> {
        vec![
            WaitFor::message_on_either_std("listening on IPv4 address"),
            WaitFor::message_on_either_std("database system is ready to accept connections"),
        ]
    }

Also just to note, I know that at least in testcontainers-node, the WaitStrategy for logs support regex as well. So that might be something one would want to look into at some point.

cbrevik avatar Oct 18 '25 06:10 cbrevik

The expectations for not-initialized postgres and initialized may vary (and technically it partially depends on version of image ofc) Current conditions are intentional to avoid flakiness - see https://github.com/testcontainers/testcontainers-rs-modules-community/issues/158

DDtKey avatar Oct 20 '25 22:10 DDtKey