testcontainers-go
testcontainers-go copied to clipboard
[Bug]: PostgreSQL container fails part of the times
Testcontainers version
0.22.0
Using the latest Testcontainers version?
No
Host OS
Mac
Host arch
ARM
Go version
1.20
Docker version
Client:
Cloud integration: v1.0.29
Version: 20.10.17
API version: 1.41
Go version: go1.17.11
Git commit: 100c701
Built: Mon Jun 6 23:04:45 2022
OS/Arch: darwin/arm64
Context: default
Experimental: true
Server: Docker Desktop 4.12.0 (85629)
Engine:
Version: 20.10.17
API version: 1.41 (minimum version 1.12)
Go version: go1.17.11
Git commit: a89b842
Built: Mon Jun 6 23:01:01 2022
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.6.8
GitCommit: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Docker info
Client:
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc., v0.9.1)
compose: Docker Compose (Docker Inc., v2.10.2)
extension: Manages Docker extensions (Docker Inc., v0.2.9)
sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
scan: Docker Scan (Docker Inc., v0.19.0)
Server:
Containers: 40
Running: 3
Paused: 0
Stopped: 37
Images: 34
Server Version: 20.10.17
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc version: v1.1.4-0-g5fd4c4d
init version: de40ad0
Security Options:
seccomp
Profile: default
cgroupns
Kernel Version: 5.10.124-linuxkit
Operating System: Docker Desktop
OSType: linux
Architecture: aarch64
CPUs: 5
Total Memory: 7.653GiB
Name: docker-desktop
ID: RKGO:4J6V:52OX:3ZQI:TARP:KB5E:TGP4:ORLD:DT72:FTO5:TVDK:G4HS
Docker Root Dir: /var/lib/docker
Debug Mode: false
HTTP Proxy: http.docker.internal:3128
HTTPS Proxy: http.docker.internal:3128
No Proxy: hubproxy.docker.internal
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
hubproxy.docker.internal:5000
127.0.0.0/8
Live Restore Enabled: false
What happened?
Some of the times the PostgreSQL container fails with the following logs / message. This is how I use the module:
c, err := postgres.RunContainer(
ctx,
postgres.WithDatabase(defaultPostgresDatabase),
postgres.WithUsername(defaultPostgresUsername),
postgres.WithPassword(defaultPostgresPassword),
testcontainers.WithImage("postgres:13.8"),
testcontainers.WithWaitStrategyAndDeadline(
time.Second*20,
wait.ForSQL(
nat.Port(defaultPostgresPort),
dialect.Postgres,
getPostgresURL,
).WithStartupTimeout(time.Second),
),
)
Relevant log output
2023/09/14 11:22:47 🚧 Waiting for container id 1d70df344982 image: postgres:13.8. Waiting for: &{timeout:<nil> deadline:0x14000bc5e50 Strategies:[0x140006e0be0]}
2023/09/14 11:22:48 container logs:
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
waiting for server to start....2023-09-14 09:22:48.208 UTC [48] LOG: starting PostgreSQL 13.8 (Debian 13.8-1.pgdg110+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-09-14 09:22:48.208 UTC [48] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-09-14 09:22:48.210 UTC [49] LOG: database system was shut down at 2023-09-14 09:22:48 UTC
2023-09-14 09:22:48.215 UTC [48] LOG: database system is ready to accept connections
done
server started
/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
waiting for server to shut down...2023-09-14 09:22:48.334 UTC [48] LOG: received fast shutdown request
.2023-09-14 09:22:48.334 UTC [48] LOG: aborting any active transactions
2023-09-14 09:22:48.335 UTC [48] LOG: background worker "logical replication launcher" (PID 55) exited with exit code 1
2023-09-14 09:22:48.336 UTC [50] LOG: shutting down
2023-09-14 09:22:48.341 UTC [48] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2023-09-14 09:22:48.447 UTC [1] LOG: starting PostgreSQL 13.8 (Debian 13.8-1.pgdg110+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-09-14 09:22:48.447 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2023-09-14 09:22:48.447 UTC [1] LOG: listening on IPv6 address "::", port 5432
2023-09-14 09:22:48.447 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-09-14 09:22:48.448 UTC [60] LOG: database system was shut down at 2023-09-14 09:22:48 UTC
2023-09-14 09:22:48.452 UTC [1] LOG: database system is ready to accept connections
panic: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.41/containers/1d70df34498270a3ddc437bea4e7d7702907bb1f17c70531985225c6c334c692/json": context deadline exceeded: failed to start container
Additional information
No response
Hey @doron-cohen I noticed you are overriding the default wait strategy that comes with the postgres module. Is there a reason for that? The default one is needed because postgres needs two occurrences of that log entry.
I say override because of https://github.com/testcontainers/testcontainers-go/blob/d6d5b0b379fa3da2484bbd4496c3601090cd5d36/generic.go#L92.
This probably is a bug though 🤔
Hey @mdelapenya I didn't see any existing wait strategies. I am looking here: https://github.com/testcontainers/testcontainers-go/blob/v0.22.0/modules/postgres/postgres.go#L113. Am I looking at the wrong place?
I also looked at the latest 0.23 version and didn't find any.
BTW, I added the SQL wait strategy since I found the PG container not to be completely ready after RunContainer
returned successfully. This was the best approach to make sure I can work with it right away.
Hey @mdelapenya I didn't see any existing wait strategies. I am looking here: https://github.com/testcontainers/testcontainers-go/blob/v0.22.0/modules/postgres/postgres.go#L113. Am I looking at the wrong place?
I also looked at the latest 0.23 version and didn't find any.
You are right 🤔 I thought it was in the default settings for the module (I was typing from phone at that moment, I'm now back at my desktop), and it's defined in the tests: https://github.com/testcontainers/testcontainers-go/blob/v0.23.0/modules/postgres/postgres_test.go#L35
I think we should probably do of the following things: a. update the docs for postgres, saying that it usually needs that wait strategy when there is no data in it (see https://github.com/testcontainers/testcontainers-java/pull/327/files#diff-57a8aa7d73d96a310849a91d3d5fa311288bc8b487f13672fc5e8bba3743983aR26) b. update the defaults
Thanks. This will clear up the confusion around waiting for it to be ready. Now the only thing remaining is failure to start from time to time. Do you need any logs or information to try and debug it?
Do you need any logs or information to try and debug it?
Yes please, the library should fail if any of the preStart hooks fails, and the wait strategies happens there. And the container logs should be outputted to the terminal, so you should see if there is anything in there
I added most of it in the issue. The container logs and all. I couldn't find anything unusual but let me try and compare to the logs on a successful startup. Anyway, this line is suspicious:
panic: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.41/containers/1d70df34498270a3ddc437bea4e7d7702907bb1f17c70531985225c6c334c692/json": context deadline exceeded: failed to start container
I raise the panic
but I don't know what causes the context deadline exceeded
while talking to the Docker daemon.
By the Docker endpoint, it seems that inspecting the container caused the deadline exceeded error.
Could it be that the ctx context has a timeout or something that causes it to be cancelled before expected?
I think this is related to postgres container behavior which
- Start the postgres service
- Setup... (client can connect but see next step)
- Shutdown the postgres service
- Start the postgres service
- Make the connection reliable
Testcontainers for Java uses the following log strategy to happen so mark the container as a ready to receive connections.
Another issue could be related to the log strategy provided by Testcontainers for Java, which would fail when postgres image (custom images) contains data already because in that case the number of occurrences for the regex is just 1. See logs attached in the issue description. Just for awareness, probably not a priority to fix it right away.
@doron-cohen I think https://github.com/testcontainers/testcontainers-go/pull/2208 could be related, as it improves the examples with an wait.ForSQL condition.
In any case, the example provided here https://golang.testcontainers.org/modules/postgres/ is showing how to configure Postgres in a generic use case, so I'd say that we can close this issue. Please let me know if it's fine to close it.
Thanks!
Closing as the example is illustrative enough. Please feel free to reopen it if you consider it.
Thanks in advance!