testcontainers-go
testcontainers-go copied to clipboard
Fix race condition when looking up reaper (ryuk) container
What does this PR do?
Updates lookUpReaperContainer
to consider the health of the ryuk container.
- If the ryuk container advertises a health status,
lookUpReaperContainer
will require a healthy container before returning. - If the ryuk container does not advertise a status, the
lookUpReaperContainer
will return the container.
Why is it important?
The ryuk container can be exposed by the Docker API before it is ready to accept connections, resulting in messages like:
2024/04/21 22:25:10 🔥 Reaper obtained from Docker for this test session 99998b629eea57023f49aefc9eda5406c84ce4c91d0c6005c1b8c00dcbbce0d7
2024/04/21 22:25:10 failed to start container: dial tcp 127.0.0.1:32917: connect: connection refused: Connecting to Ryuk on localhost:32917 failed: connecting to reaper failed: failed to create container
Tests will fail when they would otherwise pass.
This is especially the case in Go, where test binaries are run in parallel, one for each package. If multiple packages use testcontainers
, they can obtain a reaper instance from the Docker API before the container is ready to serve connections, leading to dialing errors like connect: connection refused
.
The current workaround is to run test binaries serially, e.g., go test -p 1 ./...
, which is not optimal, even for modest codebases.
Related issues
See complementary PR, which adds a HEALTHCHECK
to the ryuk container: https://github.com/testcontainers/moby-ryuk/pull/128
How to test this PR
To recreate the initial problem and run the workaround
Clone this test case repo and run go test -count=1 -v ./...
and observe the errors.
Run go test -count=1 -p 1 -v ./...
and tests should pass.
To test the fix
Clone moby-ryuk, apply pull/128, then build a new ryuk container, and tag it:
-
docker build -f linux/Dockerfile -t local/ryuk .
-
docker tag local/ryuk testcontainers/ryuk:0.7.0
Check out this branch/PR locally.
Clone the test case repo and update go.mod
to point to this branch via a replace, e.g.:
-
replace github.com/testcontainers/testcontainers-go => /Users/esm/workspaces/testcontainers-go
Running go test -count=1 -v ./...
should pass.
Deploy Preview for testcontainers-go ready!
Name | Link |
---|---|
Latest commit | d7f1b7184d62b8a6f63d4db7d963c5d1babd80ad |
Latest deploy log | https://app.netlify.com/sites/testcontainers-go/deploys/6666e1bfca618100088446f4 |
Deploy Preview | https://deploy-preview-2508--testcontainers-go.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
Hey team, just bumping this PR. Would love to hear feedback and if there's anything I can do to help move it forward 🙏
Thanks for all your efforts on Testcontainers! Great project.
Clone this test case repo and run go test -count=1 -v ./... and observe the errors.
Hi @emetsger I cannot reproduce this yet:
go test -count=1 -v ./...
=== RUN TestA
2024/04/26 17:27:38 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 78+testcontainerscloud (via Testcontainers Desktop 1.10.1)
API Version: 1.43
Operating System: Ubuntu 22.04.4 LTS
Total Memory: 15779 MB
Resolved Docker Host: tcp://127.0.0.1:54509
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: a077cc8ef4acf0841cd4b227aeac15cf8611190b47530d735cb90d3993d5d084
Test ProcessID: b81dbac4-46a1-4e91-a577-a7e47d57518e
2024/04/26 17:27:38 🔥 Reaper obtained from Docker for this test session 494993ccefba558a0d64b9ddce7d8db30444e35a49bf12d9cbd8ef205c23e05f
2024/04/26 17:27:38 🐳 Creating container for image docker.io/postgres:16-alpine
2024/04/26 17:27:38 ✅ Container created: 0b7ae9cbdeb1
2024/04/26 17:27:38 🐳 Starting container: 0b7ae9cbdeb1
2024/04/26 17:27:39 ✅ Container started: 0b7ae9cbdeb1
2024/04/26 17:27:39 🚧 Waiting for container id 0b7ae9cbdeb1 image: docker.io/postgres:16-alpine. Waiting for: &{timeout:<nil> deadline:0x140003c6e88 Strategies:[0x140003e2c30]}
2024/04/26 17:27:41 🔔 Container is ready: 0b7ae9cbdeb1
2024/04/26 17:27:41 🐳 Terminating container: 0b7ae9cbdeb1
2024/04/26 17:27:41 🚫 Container terminated: 0b7ae9cbdeb1
--- PASS: TestA (3.65s)
PASS
ok tc-debug/a 4.012s
=== RUN TestB
2024/04/26 17:27:37 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 78+testcontainerscloud (via Testcontainers Desktop 1.10.1)
API Version: 1.43
Operating System: Ubuntu 22.04.4 LTS
Total Memory: 15779 MB
Resolved Docker Host: tcp://127.0.0.1:54509
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: a077cc8ef4acf0841cd4b227aeac15cf8611190b47530d735cb90d3993d5d084
Test ProcessID: 2a7fd6eb-d055-4e31-aaa6-00cf7cd47542
2024/04/26 17:27:37 🐳 Creating container for image testcontainers/ryuk:0.7.0
2024/04/26 17:27:38 ✅ Container created: 494993ccefba
2024/04/26 17:27:38 🐳 Starting container: 494993ccefba
2024/04/26 17:27:38 ✅ Container started: 494993ccefba
2024/04/26 17:27:38 🚧 Waiting for container id 494993ccefba image: testcontainers/ryuk:0.7.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/04/26 17:27:38 🔔 Container is ready: 494993ccefba
2024/04/26 17:27:38 🐳 Creating container for image docker.io/postgres:16-alpine
2024/04/26 17:27:38 ✅ Container created: 8386f359812f
2024/04/26 17:27:38 🐳 Starting container: 8386f359812f
2024/04/26 17:27:39 ✅ Container started: 8386f359812f
2024/04/26 17:27:39 🚧 Waiting for container id 8386f359812f image: docker.io/postgres:16-alpine. Waiting for: &{timeout:<nil> deadline:0x1400037ef28 Strategies:[0x1400039ebd0]}
2024/04/26 17:27:41 🔔 Container is ready: 8386f359812f
2024/04/26 17:27:41 🐳 Terminating container: 8386f359812f
2024/04/26 17:27:41 🚫 Container terminated: 8386f359812f
--- PASS: TestB (3.92s)
PASS
ok tc-debug/b 4.104s
=== RUN TestC
2024/04/26 17:27:38 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 78+testcontainerscloud (via Testcontainers Desktop 1.10.1)
API Version: 1.43
Operating System: Ubuntu 22.04.4 LTS
Total Memory: 15779 MB
Resolved Docker Host: tcp://127.0.0.1:54509
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: a077cc8ef4acf0841cd4b227aeac15cf8611190b47530d735cb90d3993d5d084
Test ProcessID: 0b1569fa-f57c-46af-afe3-b26dc818fbc8
2024/04/26 17:27:38 🔥 Reaper obtained from Docker for this test session 494993ccefba558a0d64b9ddce7d8db30444e35a49bf12d9cbd8ef205c23e05f
2024/04/26 17:27:38 🐳 Creating container for image docker.io/postgres:16-alpine
2024/04/26 17:27:38 ✅ Container created: b2b8530b534d
2024/04/26 17:27:38 🐳 Starting container: b2b8530b534d
2024/04/26 17:27:38 ✅ Container started: b2b8530b534d
2024/04/26 17:27:38 🚧 Waiting for container id b2b8530b534d image: docker.io/postgres:16-alpine. Waiting for: &{timeout:<nil> deadline:0x1400037ef28 Strategies:[0x1400039ebd0]}
2024/04/26 17:27:40 🔔 Container is ready: b2b8530b534d
2024/04/26 17:27:40 🐳 Terminating container: b2b8530b534d
2024/04/26 17:27:41 🚫 Container terminated: b2b8530b534d
--- PASS: TestC (3.28s)
PASS
ok tc-debug/c 3.756s
=== RUN TestD
2024/04/26 17:27:38 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 78+testcontainerscloud (via Testcontainers Desktop 1.10.1)
API Version: 1.43
Operating System: Ubuntu 22.04.4 LTS
Total Memory: 15779 MB
Resolved Docker Host: tcp://127.0.0.1:54509
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: a077cc8ef4acf0841cd4b227aeac15cf8611190b47530d735cb90d3993d5d084
Test ProcessID: bb20c031-95bb-4791-af3b-e4b1ecf5d818
2024/04/26 17:27:38 🔥 Reaper obtained from Docker for this test session 494993ccefba558a0d64b9ddce7d8db30444e35a49bf12d9cbd8ef205c23e05f
2024/04/26 17:27:38 🐳 Creating container for image docker.io/postgres:16-alpine
2024/04/26 17:27:38 ✅ Container created: 2f51f8a61406
2024/04/26 17:27:38 🐳 Starting container: 2f51f8a61406
2024/04/26 17:27:39 ✅ Container started: 2f51f8a61406
2024/04/26 17:27:39 🚧 Waiting for container id 2f51f8a61406 image: docker.io/postgres:16-alpine. Waiting for: &{timeout:<nil> deadline:0x140003f8e58 Strategies:[0x14000458bd0]}
2024/04/26 17:27:41 🔔 Container is ready: 2f51f8a61406
2024/04/26 17:27:41 🐳 Terminating container: 2f51f8a61406
2024/04/26 17:27:41 🚫 Container terminated: 2f51f8a61406
--- PASS: TestD (3.32s)
PASS
ok tc-debug/d 3.933s
I've run run it even with count=10, with no errors. Is there anything you think I can do to reproduce it?
Thank you for trying to reproduce!
I’m using Rancher, so I wonder if that may be contributing to the discrepancy.
What container engine/version are you on? I’ll be back at my computer later this weekend and I can do some more testing on my side. Thank you!
On Fri, Apr 26, 2024 at 11:35 AM Manuel de la Peña @.***> wrote:
Clone this test case repo https://github.com/emetsger/testcontainers-ryuk-testcase and run go test -count=1 -v ./... and observe the errors.
Hi @emetsger https://github.com/emetsger I cannot reproduce this yet:
go test -count=1 -v ./... === RUN TestA 2024/04/26 17:27:38 github.com/testcontainers/testcontainers-go - Connected to docker: Server Version: 78+testcontainerscloud (via Testcontainers Desktop 1.10.1) API Version: 1.43 Operating System: Ubuntu 22.04.4 LTS Total Memory: 15779 MB Resolved Docker Host: tcp://127.0.0.1:54509 Resolved Docker Socket Path: /var/run/docker.sock Test SessionID: a077cc8ef4acf0841cd4b227aeac15cf8611190b47530d735cb90d3993d5d084 Test ProcessID: b81dbac4-46a1-4e91-a577-a7e47d57518e 2024/04/26 17:27:38 🔥 Reaper obtained from Docker for this test session 494993ccefba558a0d64b9ddce7d8db30444e35a49bf12d9cbd8ef205c23e05f 2024/04/26 17:27:38 🐳 Creating container for image docker.io/postgres:16-alpine 2024/04/26 http://docker.io/postgres:16-alpine2024/04/26 17:27:38 ✅ Container created: 0b7ae9cbdeb1 2024/04/26 17:27:38 🐳 Starting container: 0b7ae9cbdeb1 2024/04/26 17:27:39 ✅ Container started: 0b7ae9cbdeb1 2024/04/26 17:27:39 🚧 Waiting for container id 0b7ae9cbdeb1 image: docker.io/postgres:16-alpine. Waiting for: &{timeout:
deadline:0x140003c6e88 Strategies:[0x140003e2c30]} 2024/04/26 17:27:41 🔔 Container is ready: 0b7ae9cbdeb1 2024/04/26 17:27:41 🐳 Terminating container: 0b7ae9cbdeb1 2024/04/26 17:27:41 🚫 Container terminated: 0b7ae9cbdeb1 --- PASS: TestA (3.65s) PASS ok tc-debug/a 4.012s === RUN TestB 2024/04/26 17:27:37 github.com/testcontainers/testcontainers-go - Connected to docker: Server Version: 78+testcontainerscloud (via Testcontainers Desktop 1.10.1) API Version: 1.43 Operating System: Ubuntu 22.04.4 LTS Total Memory: 15779 MB Resolved Docker Host: tcp://127.0.0.1:54509 Resolved Docker Socket Path: /var/run/docker.sock Test SessionID: a077cc8ef4acf0841cd4b227aeac15cf8611190b47530d735cb90d3993d5d084 Test ProcessID: 2a7fd6eb-d055-4e31-aaa6-00cf7cd47542 2024/04/26 17:27:37 🐳 Creating container for image testcontainers/ryuk:0.7.0 2024/04/26 17:27:38 ✅ Container created: 494993ccefba 2024/04/26 17:27:38 🐳 Starting container: 494993ccefba 2024/04/26 17:27:38 ✅ Container started: 494993ccefba 2024/04/26 17:27:38 🚧 Waiting for container id 494993ccefba image: testcontainers/ryuk:0.7.0. Waiting for: &{Port:8080/tcp timeout: PollInterval:100ms} 2024/04/26 17:27:38 🔔 Container is ready: 494993ccefba 2024/04/26 17:27:38 🐳 Creating container for image docker.io/postgres:16-alpine 2024/04/26 http://docker.io/postgres:16-alpine2024/04/26 17:27:38 ✅ Container created: 8386f359812f 2024/04/26 17:27:38 🐳 Starting container: 8386f359812f 2024/04/26 17:27:39 ✅ Container started: 8386f359812f 2024/04/26 17:27:39 🚧 Waiting for container id 8386f359812f image: docker.io/postgres:16-alpine. Waiting for: &{timeout: deadline:0x1400037ef28 Strategies:[0x1400039ebd0]} 2024/04/26 17:27:41 🔔 Container is ready: 8386f359812f 2024/04/26 17:27:41 🐳 Terminating container: 8386f359812f 2024/04/26 17:27:41 🚫 Container terminated: 8386f359812f --- PASS: TestB (3.92s) PASS ok tc-debug/b 4.104s === RUN TestC 2024/04/26 17:27:38 github.com/testcontainers/testcontainers-go - Connected to docker: Server Version: 78+testcontainerscloud (via Testcontainers Desktop 1.10.1) API Version: 1.43 Operating System: Ubuntu 22.04.4 LTS Total Memory: 15779 MB Resolved Docker Host: tcp://127.0.0.1:54509 Resolved Docker Socket Path: /var/run/docker.sock Test SessionID: a077cc8ef4acf0841cd4b227aeac15cf8611190b47530d735cb90d3993d5d084 Test ProcessID: 0b1569fa-f57c-46af-afe3-b26dc818fbc8 2024/04/26 17:27:38 🔥 Reaper obtained from Docker for this test session 494993ccefba558a0d64b9ddce7d8db30444e35a49bf12d9cbd8ef205c23e05f 2024/04/26 17:27:38 🐳 Creating container for image docker.io/postgres:16-alpine 2024/04/26 http://docker.io/postgres:16-alpine2024/04/26 17:27:38 ✅ Container created: b2b8530b534d 2024/04/26 17:27:38 🐳 Starting container: b2b8530b534d 2024/04/26 17:27:38 ✅ Container started: b2b8530b534d 2024/04/26 17:27:38 🚧 Waiting for container id b2b8530b534d image: docker.io/postgres:16-alpine. Waiting for: &{timeout: deadline:0x1400037ef28 Strategies:[0x1400039ebd0]} 2024/04/26 17:27:40 🔔 Container is ready: b2b8530b534d 2024/04/26 17:27:40 🐳 Terminating container: b2b8530b534d 2024/04/26 17:27:41 🚫 Container terminated: b2b8530b534d --- PASS: TestC (3.28s) PASS ok tc-debug/c 3.756s === RUN TestD 2024/04/26 17:27:38 github.com/testcontainers/testcontainers-go - Connected to docker: Server Version: 78+testcontainerscloud (via Testcontainers Desktop 1.10.1) API Version: 1.43 Operating System: Ubuntu 22.04.4 LTS Total Memory: 15779 MB Resolved Docker Host: tcp://127.0.0.1:54509 Resolved Docker Socket Path: /var/run/docker.sock Test SessionID: a077cc8ef4acf0841cd4b227aeac15cf8611190b47530d735cb90d3993d5d084 Test ProcessID: bb20c031-95bb-4791-af3b-e4b1ecf5d818 2024/04/26 17:27:38 🔥 Reaper obtained from Docker for this test session 494993ccefba558a0d64b9ddce7d8db30444e35a49bf12d9cbd8ef205c23e05f 2024/04/26 17:27:38 🐳 Creating container for image docker.io/postgres:16-alpine 2024/04/26 http://docker.io/postgres:16-alpine2024/04/26 17:27:38 ✅ Container created: 2f51f8a61406 2024/04/26 17:27:38 🐳 Starting container: 2f51f8a61406 2024/04/26 17:27:39 ✅ Container started: 2f51f8a61406 2024/04/26 17:27:39 🚧 Waiting for container id 2f51f8a61406 image: docker.io/postgres:16-alpine. Waiting for: &{timeout: deadline:0x140003f8e58 Strategies:[0x14000458bd0]} 2024/04/26 17:27:41 🔔 Container is ready: 2f51f8a61406 2024/04/26 17:27:41 🐳 Terminating container: 2f51f8a61406 2024/04/26 17:27:41 🚫 Container terminated: 2f51f8a61406 --- PASS: TestD (3.32s) PASS ok tc-debug/d 3.933s I've run run it even with count=10, with no errors. Is there anything you think I can do to reproduce it?
— Reply to this email directly, view it on GitHub https://github.com/testcontainers/testcontainers-go/pull/2508#issuecomment-2079628918, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD4GRM6OFYD2Q3WX5QWJDY7JX5DAVCNFSM6AAAAABGR5VJXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZZGYZDQOJRHA . You are receiving this because you were mentioned.Message ID: @.***>
What container engine/version are you on?
I'm using both Docker for Mac and Testcontainers Cloud.
I am experiencing a similar issue in (Jenkins) CI, although it's flaky - only on some runs there is a connection refused error as follows.
[2024-05-02T09:39:21.954Z] 2024/05/02 09:39:12 github.com/testcontainers/testcontainers-go - Connected to docker:
[2024-05-02T09:39:21.954Z] Server Version: 20.10.23
[2024-05-02T09:39:21.954Z] API Version: 1.41
[2024-05-02T09:39:21.954Z] Operating System: Rocky Linux 9.2 (Blue Onyx)
[2024-05-02T09:39:21.954Z] Total Memory: 15982 MB
[2024-05-02T09:39:21.954Z] Resolved Docker Host: unix:///var/run/docker.sock
[2024-05-02T09:39:21.954Z] Resolved Docker Socket Path: /var/run/docker.sock
[2024-05-02T09:39:21.954Z] Test SessionID: 9644673c1b6ea45d4962460bb5e7a2a9ba32c1f293bf4166d0e43de70c021a86
[2024-05-02T09:39:21.954Z] Test ProcessID: 05a16b84-4e96-4b61-ac3d-29f63de4d258
[2024-05-02T09:39:21.954Z] 2024/05/02 09:39:12 🐳 Creating container for image testcontainers/ryuk:0.7.0
[2024-05-02T09:39:21.954Z] 2024/05/02 09:39:13 🔥 Reaper obtained from Docker for this test session 1179229b008b48672081fd3372999389793505d2b013d054c454a6838a6b5544
[2024-05-02T09:39:21.954Z] <redacted>: dial tcp <redacted>:35412: connect: connection refused: Connecting to Ryuk on <redacted>:35412 failed: connecting to reaper failed: failed to create container
Note that I am unable to reproduce this on my machine (with different container engine & version).
Do you know what docker engine/version is running in your CI?
On Thu, May 2, 2024 at 6:00 AM Simon Bos @.***> wrote:
I am experiencing a similar issue in (Jenkins) CI, although it's flaky - only on some runs there is a connection refused error as follows.
[2024-05-02T09:39:21.954Z] 2024/05/02 09:39:12 github.com/testcontainers/testcontainers-go - Connected to docker: [2024-05-02T09:39:21.954Z] Server Version: 20.10.23 [2024-05-02T09:39:21.954Z] API Version: 1.41 [2024-05-02T09:39:21.954Z] Operating System: Rocky Linux 9.2 (Blue Onyx) [2024-05-02T09:39:21.954Z] Total Memory: 15982 MB [2024-05-02T09:39:21.954Z] Resolved Docker Host: unix:///var/run/docker.sock [2024-05-02T09:39:21.954Z] Resolved Docker Socket Path: /var/run/docker.sock [2024-05-02T09:39:21.954Z] Test SessionID: 9644673c1b6ea45d4962460bb5e7a2a9ba32c1f293bf4166d0e43de70c021a86 [2024-05-02T09:39:21.954Z] Test ProcessID: 05a16b84-4e96-4b61-ac3d-29f63de4d258 [2024-05-02T09:39:21.954Z] 2024/05/02 09:39:12 🐳 Creating container for image testcontainers/ryuk:0.7.0 [2024-05-02T09:39:21.954Z] 2024/05/02 09:39:13 🔥 Reaper obtained from Docker for this test session 1179229b008b48672081fd3372999389793505d2b013d054c454a6838a6b5544 [2024-05-02T09:39:21.954Z]
: dial tcp :35412: connect: connection refused: Connecting to Ryuk on :35412 failed: connecting to reaper failed: failed to create container Note that I am unable to reproduce this on my machine (with different container engine & version).
— Reply to this email directly, view it on GitHub https://github.com/testcontainers/testcontainers-go/pull/2508#issuecomment-2090071281, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD4GQZUZCQO7VRD4VO4U3ZAIFEHAVCNFSM6AAAAABGR5VJXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJQGA3TCMRYGE . You are receiving this because you were mentioned.Message ID: @.***>
Do you know what docker engine/version is running in your CI?
It's running "Docker Engine - Community" version 20.10.23. There is some extra info in the logs above.
@simonbos what docker engine/version are you running on your local machine?
Edit: updated with Colima results (thanks @lachauj for bringing this to our attention!)
Test Case | Test Result | Testcontainers Version | Container Manager | Container Engine Version | Container Engine OS | API Version | containerd version | runc version | docker-init version |
---|---|---|---|---|---|---|---|---|---|
1 | 🚫 Fail | main (539284ce ), testcontainers/ryuk:0.7.0 |
Rancher 1.13.1 | dockerd/moby 24.0.7 per shell dockerd -v |
Alpine Linux v3.19.1 per shell, /etc/alpine-release |
1.43, per shell docker version |
v1.7.10 per shell, containerd -v |
1.1.10 per shell, docker version |
0.19.0 per shell, docker version |
2 | ✅ Succeeds | main (539284ce ), testcontainers/ryuk:0.7.0 |
Docker for Mac v4.29.0 | Docker Engine/moby 26.0.0 | Docker Desktop (per test container output) | 1.45 (per release notes), 1.44 per testcontainers output. | v1.7.13 (per release notes), 1.6.28 per shell | 1.1.12 (per release notes) | |
3 | ✅ Succeeds | main (539284ce ), testcontainers/ryuk:0.7.0 |
78+testcontainerscloud (via Testcontainers Desktop 1.11.0) (per testcontainers output) | 1.43 (per testcontainers output) | |||||
4 | Sporadically fails (reported, but could not reproduce; see test case #5) | main (539284ce ), testcontainers/ryuk:0.7.0 |
Docker Engine Community 20.10.23 (release notes) | dockerd/moby 20.10.23 (per testcontainers output) | 1.41 (per Engine/API version matrix) and testcontainers output | v1.6.15 (per release notes) | |||
5 | ✅ Succeeds | main (539284ce ), testcontainers/ryuk:0.7.0 |
Docker for Mac v4.17.0 | dockerd/moby 20.10.23 per shell, dockerd -v |
Docker Desktop (per test container output) | 1.41 (per Engine/API version matrix) and testcontainers output | v1.6.18 per shell, containerd -v |
1.1.4 per shell, runc -v |
|
6 | 🚫 Fail | main (539284ce ), testcontainers/ryuk:0.7.0 |
Colima 0.6.9 | dockerd/moby 26.1.1 per shell, dockerd -v |
Ubuntu 24.04 LTS (/etc/os-release ) |
1.45, per shell docker version |
1.16.31 per shell, containerd -v |
1.1.12 per shell runc -v |
0.19.0 per shell docker version |
(link to gist)
@mdelapenya I ran the test case against a variety of container managers/docker engines (see table above)
I was able to successfully run my test case against all the container managers except for Rancher.
- I attempted to reproduce the failure reported by @simonbos by using the same docker engine version used in their CI, but after running the test case multiple times, I never got it to fail.
@mdelapenya can you install Rancher 1.13.1 (latest at the time of this writing) and attempt to reproduce on your side?
Edit: test matrix table updated to include Colima, which also fails 🚫
Wow! @emetsger thank you so much for dedicating that amount of time to reproduce the potential bug in so many container environments. Very proud of you 🙇
I'll install rancher today and try to reproduce it. Will ping you back here
Wow! @emetsger thank you so much for dedicating that amount of time to reproduce the potential bug in so many container environments. Very proud of you
It's the least I can do: trying to pay it forward!
Thanks for the effort @emetsger !
what docker engine/version are you running on your local machine?
Docker Desktop for Mac v4.29.0.
Same issue with Colima on MacOS Sonoma, works randomly (95% of the time it triggers the issue) but works at 100% with Docker Desktop. Is not that simpler to just try to reconnect when a Reaper instance is found like testcontainers-node ?
@lachauj I could repeat your failure with my test case ✅
I updated the test matrix table above to include Colima.
Edit: This PR and the associated moby/ryuk patch does not fix Colima. I rebased https://github.com/testcontainers/moby-ryuk/pull/128 against the latest main
, and it still fails. 😢
Edit #2: The only way I can get the test case to pass with colima was to introduce an artificial delay, here, before returning the ryuk container (e.g. with time.Sleep(5*time.Second)
).
So there seems to be something else going on with Colima that isn't addressed by the healthcheck patches.
Colima Version | Test Case Success | moby/ryuk base | healthcheck patch version |
---|---|---|---|
0.6.9 | 🚫 Fail | 103d99f5 |
v1 |
0.6.9 | 🚫 Fail | c612bb69 |
v2 |
Summary Findings
Container Manager | Test Case Passes - Stock Testcontainers | Test Case Passes - with Healthcheck support | Notes |
---|---|---|---|
Docker | ✅ | ✅ | Unclear what Docker does differently to succeed |
TestContainers Cloud | ✅ | ✅ | Unclear what TestContainers Cloud does differently to succeed (does it use Docker under the hood?) |
Rancher | 🚫 (100% failure rate) | ⚠️ (~90% pass rate) | I suspect the delay introduced by the healthcheck increases the probability of success dramatically, but doesn't address the root cause. |
Colima | 🚫 (100% failure rate) | 🚫 (100% failure rate) | The test case will pass by introducing a 5s delay before returning the Ryuk container |
Discussion
After testing a number of different container managers, it's clear that their behavior differs.
- Docker: the test case repo passes for all versions of Docker, with and without healthcheck support.
- There is a reported failure with Docker 4.17, but the test case repo passes for me, so I can't confirm it.
- Rancher: the test case repo fails 100% of the time without healthcheck support, and passes >90% of the time with it.
- Run the test case repo with:
for i in 'jot 10 0 9 1'; do echo "Test run $i:" ; go test -count=1 -v ./... ; done
and it will succeed 9/10 or 10/10 times.
- Run the test case repo with:
- Colima: the test case repo fails 100% of the time with and without healthcheck support.
I think adding healthcheck support is a good idea, but it isn't going to solve the root issue, whatever that may be.
In the meantime, there are two different workarounds:
- Users of testcontainers-go isolate their use of testcontainers to a single package, or
- Run their test binaries serially with
go test -p 1 ./...
The fact that this bug is only raised when multiple test binaries are executing testcontainers-go code concurrently makes me think that there is some kind of state where the ryuk container can get returned "early", before the container manager is ready to accept connections to it.
Fix
Ok, I believe cd0a106 has the fix. It works with Rancher and Colima, 100% of the time ✅ 🎉
There is a race between newReaper
and lookUpReaperContainer
: lookUpReaperContainer
can retrieve the container before the ports are confirmed to be listening.
Description
The container request composed by newReaper
includes a WaitingFor: wait.ForListeningPort(listeningPort)
, which becomes a readiness hook that is waited on when starting the container.
It is possible for lookUpReaperContainer
- running in a separate process from newReaper
- to obtain a reference to the container while the readiness hook is still running in the newReaper
process.
cd0a106 updates lookUpReaperContainer
to execute wait.ForListeningPort(listeningPort)
before returning, ensuring that the ports will definitely be listening.
@mdelapenya @Bablzz: what do you think? With this fix do you think we can move this PR forward?
@simonbos @lachauj: can you try this PR out and see if it fixes your errors?
hello @emetsger ! I hope you are doing well:) for me - done is better than perfect :)
Hi @emetsger ! Thanks a lot, that's works perfectly!
Hey @emetsger, applying this fix seems to work for me - I haven't seen any connect: connection refused
since.
@emetsger I added a commit on top for simplification. If you agree, I'm fine merging this one. Thanks!
Folks, I'm prioritising this fix, so please take a look at the few commits I added on top of @emetsger's fantastic work. I'd appreciate your feedback so I can make this into the project asap.
Thanks!
BTW I had time to install Rancher Desktop and configure the tests for it, and this branch works perfectly. I verified the repro code with and without this branch being applied, and the fix is effective:
Applying this branch:
go test -count=1 -v ./...
=== RUN TestA
2024/06/10 16:47:39 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 25.0.5
API Version: 1.44
Operating System: Alpine Linux v3.19
Total Memory: 3916 MB
Resolved Docker Host: unix:///Users/mdelapenya/.rd/docker.sock
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: 02a8bae92f37b36dd39598918d2de301ad83fa08b80f0c145ac5f7e810de993d
Test ProcessID: 98a3a75a-4b04-48d6-a214-2f97d0b84f6e
2024/06/10 16:47:39 🔥 Reaper obtained from Docker for this test session 80c2f254a0a9ae8a79726aa6ffe8a2abf8246e4f84627322694168b8af1a264d
2024/06/10 16:47:39 ⏳ Waiting for Reaper port to be ready
2024/06/10 16:47:40 🐳 Creating container for image docker.io/postgres:16-alpine
2024/06/10 16:47:40 ✅ Container created: c77cadfaf2ee
2024/06/10 16:47:40 🐳 Starting container: c77cadfaf2ee
2024/06/10 16:47:40 ✅ Container started: c77cadfaf2ee
2024/06/10 16:47:40 ⏳ Waiting for container id c77cadfaf2ee image: docker.io/postgres:16-alpine. Waiting for: &{timeout:<nil> deadline:0x14000458040 Strategies:[0x14000440d50]}
2024/06/10 16:47:42 🔔 Container is ready: c77cadfaf2ee
2024/06/10 16:47:42 🐳 Terminating container: c77cadfaf2ee
2024/06/10 16:47:43 🚫 Container terminated: c77cadfaf2ee
--- PASS: TestA (3.57s)
PASS
ok tc-debug/a 5.000s
=== RUN TestB
2024/06/10 16:47:39 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 25.0.5
API Version: 1.44
Operating System: Alpine Linux v3.19
Total Memory: 3916 MB
Resolved Docker Host: unix:///Users/mdelapenya/.rd/docker.sock
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: 02a8bae92f37b36dd39598918d2de301ad83fa08b80f0c145ac5f7e810de993d
Test ProcessID: c4c5e8cd-3cf7-454d-baa9-2250f692b88e
2024/06/10 16:47:39 🐳 Creating container for image testcontainers/ryuk:0.7.0
2024/06/10 16:47:39 ✅ Container created: 80c2f254a0a9
2024/06/10 16:47:39 🐳 Starting container: 80c2f254a0a9
2024/06/10 16:47:39 ✅ Container started: 80c2f254a0a9
2024/06/10 16:47:39 ⏳ Waiting for container id 80c2f254a0a9 image: testcontainers/ryuk:0.7.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/06/10 16:47:40 🔔 Container is ready: 80c2f254a0a9
2024/06/10 16:47:40 🐳 Creating container for image docker.io/postgres:16-alpine
2024/06/10 16:47:40 ✅ Container created: 48fde13488d4
2024/06/10 16:47:40 🐳 Starting container: 48fde13488d4
2024/06/10 16:47:40 ✅ Container started: 48fde13488d4
2024/06/10 16:47:40 ⏳ Waiting for container id 48fde13488d4 image: docker.io/postgres:16-alpine. Waiting for: &{timeout:<nil> deadline:0x1400041a138 Strategies:[0x140003fcd80]}
2024/06/10 16:47:42 🔔 Container is ready: 48fde13488d4
2024/06/10 16:47:42 🐳 Terminating container: 48fde13488d4
2024/06/10 16:47:43 🚫 Container terminated: 48fde13488d4
--- PASS: TestB (3.75s)
PASS
ok tc-debug/b 4.759s
=== RUN TestC
2024/06/10 16:47:39 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 25.0.5
API Version: 1.44
Operating System: Alpine Linux v3.19
Total Memory: 3916 MB
Resolved Docker Host: unix:///Users/mdelapenya/.rd/docker.sock
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: 02a8bae92f37b36dd39598918d2de301ad83fa08b80f0c145ac5f7e810de993d
Test ProcessID: 8a45a0fb-a20e-4212-9a2d-1d0c5b941f4a
2024/06/10 16:47:39 🔥 Reaper obtained from Docker for this test session 80c2f254a0a9ae8a79726aa6ffe8a2abf8246e4f84627322694168b8af1a264d
2024/06/10 16:47:39 ⏳ Waiting for Reaper port to be ready
2024/06/10 16:47:40 🐳 Creating container for image docker.io/postgres:16-alpine
2024/06/10 16:47:40 ✅ Container created: afb27dc43c4f
2024/06/10 16:47:40 🐳 Starting container: afb27dc43c4f
2024/06/10 16:47:40 ✅ Container started: afb27dc43c4f
2024/06/10 16:47:40 ⏳ Waiting for container id afb27dc43c4f image: docker.io/postgres:16-alpine. Waiting for: &{timeout:<nil> deadline:0x14000458040 Strategies:[0x14000440d50]}
2024/06/10 16:47:42 🔔 Container is ready: afb27dc43c4f
2024/06/10 16:47:42 🐳 Terminating container: afb27dc43c4f
2024/06/10 16:47:43 🚫 Container terminated: afb27dc43c4f
--- PASS: TestC (3.61s)
PASS
ok tc-debug/c 4.934s
=== RUN TestD
2024/06/10 16:47:39 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 25.0.5
API Version: 1.44
Operating System: Alpine Linux v3.19
Total Memory: 3916 MB
Resolved Docker Host: unix:///Users/mdelapenya/.rd/docker.sock
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: 02a8bae92f37b36dd39598918d2de301ad83fa08b80f0c145ac5f7e810de993d
Test ProcessID: b3dce865-c459-4fb5-a29c-5bfeeecaec0c
2024/06/10 16:47:39 🔥 Reaper obtained from Docker for this test session 80c2f254a0a9ae8a79726aa6ffe8a2abf8246e4f84627322694168b8af1a264d
2024/06/10 16:47:39 ⏳ Waiting for Reaper port to be ready
2024/06/10 16:47:40 🐳 Creating container for image docker.io/postgres:16-alpine
2024/06/10 16:47:40 ✅ Container created: 50287fddb4f2
2024/06/10 16:47:40 🐳 Starting container: 50287fddb4f2
2024/06/10 16:47:40 ✅ Container started: 50287fddb4f2
2024/06/10 16:47:40 ⏳ Waiting for container id 50287fddb4f2 image: docker.io/postgres:16-alpine. Waiting for: &{timeout:<nil> deadline:0x14000458058 Strategies:[0x14000440d80]}
2024/06/10 16:47:42 🔔 Container is ready: 50287fddb4f2
2024/06/10 16:47:42 🐳 Terminating container: 50287fddb4f2
2024/06/10 16:47:42 🚫 Container terminated: 50287fddb4f2
--- PASS: TestD (3.08s)
PASS
ok tc-debug/d 4.161s
For that reason I'm merging it into main.
Thanks everybody here providing such valuable feedback 🙇
Great! Thank you for carrying this forward and merging - I’ve been out of pocket, appreciate the effort and community! 💪🙏
On Mon, Jun 10, 2024 at 11:56 AM Manuel de la Peña @.***> wrote:
Merged #2508 https://github.com/testcontainers/testcontainers-go/pull/2508 into main.
— Reply to this email directly, view it on GitHub https://github.com/testcontainers/testcontainers-go/pull/2508#event-13102973014, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD4GT437CEQL4MYIY3TR3ZGW5DRAVCNFSM6AAAAABGR5VJXSVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTGEYDEOJXGMYDCNA . You are receiving this because you were mentioned.Message ID: @.*** .com>
I agree that the ryuk health check is not needed!
On Tue, May 21, 2024 at 5:32 AM Manuel de la Peña @.***> wrote:
@.**** commented on this pull request.
@emetsger https://github.com/emetsger thanks a lot again for your detailed issue report and the work done for reproducing the bug. As a maintainer, it helped me a lot to identify the root cause, specially this: #2508 (comment) https://github.com/testcontainers/testcontainers-go/pull/2508#issuecomment-2119346695
I added a few comments, let's address and I think we are done here, not needing the healthcheck in the Ryuk container (please see @kiview https://github.com/kiview's comment https://github.com/testcontainers/moby-ryuk/pull/128#issuecomment-2121967711 )
In docker.go https://github.com/testcontainers/testcontainers-go/pull/2508#discussion_r1607874443 :
@@ -1590,6 +1592,11 @@ func containerFromDockerResponse(ctx context.Context, response types.Container) return nil, err }
- // the health status of the container, if any
- if health := container.raw.State.Health; health != nil {
Is there any case where State can be nil? If so, we can return the container with a default healthStatus 🤔
In reaper.go https://github.com/testcontainers/testcontainers-go/pull/2508#discussion_r1607878015 :
// if a health status is present on the container, and the container is not healthy, error
if r.healthStatus != "" {
if r.healthStatus != types.Healthy && r.healthStatus != types.NoHealthcheck {
return fmt.Errorf("container %s is not healthy, wanted status=%s, got status=%s", resp[0].ID[:8], types.Healthy, r.healthStatus)
}
}
Do you think we could add this check as a readiness hook? Then I think this fix would be more intentional. Thoughts?
In reaper.go https://github.com/testcontainers/testcontainers-go/pull/2508#discussion_r1607879869 :
Logger.Printf("⏳ Waiting for Reaper port to be ready")
var containerJson *types.ContainerJSON
if containerJson, err = reaperContainer.Inspect(ctx); err != nil {
return nil, fmt.Errorf("failed to inspect reaper container %s: %w", reaperContainer.ID[:8], err)
}
if containerJson != nil && containerJson.NetworkSettings != nil {
for port := range containerJson.NetworkSettings.Ports {
err := wait.ForListeningPort(port).
WithPollInterval(100*time.Millisecond).
WaitUntilReady(ctx, reaperContainer)
if err != nil {
return nil, fmt.Errorf("failed waiting for reaper container %s port %s/%s to be ready: %w",
reaperContainer.ID[:8], port.Proto(), port.Port(), err)
}
}
}
Do you think we can add this logic to the reuseReaperContainer function so that all invocations of that function benefit for this code?
— Reply to this email directly, view it on GitHub https://github.com/testcontainers/testcontainers-go/pull/2508#pullrequestreview-2067912834, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD4GWBY27JFPCZY4W6PZDZDMBBHAVCNFSM6AAAAABGR5VJXSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDANRXHEYTEOBTGQ . You are receiving this because you were mentioned.Message ID: @.***>