testcontainers-go
testcontainers-go copied to clipboard
[Bug]: Endpoint() returns "port not found" error, even though wait.ForListeningPort is used
Testcontainers version
0.15.0
Using the latest Testcontainers version?
Yes
Host OS
Linux
Host Arch
x86_64
Go Version
1.19
Docker version
mero@dibbler ~$ podman version
Version: 3.4.7
API Version: 3.4.7
Go Version: go1.16.15
Built: Thu Apr 21 19:38:09 2022
OS/Arch: linux/amd64
Docker info
mero@dibbler ~$ podman info
host:
arch: amd64
buildahVersion: 1.23.1
cgroupControllers:
- cpu
- io
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.0.32-1.fc34.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.32, commit: '
cpus: 12
distribution:
distribution: fedora
variant: workstation
version: "34"
eventLogger: journald
hostname: dibbler
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 5.17.12-100.fc34.x86_64
linkmode: dynamic
logDriver: k8s-file
memFree: 1298100224
memTotal: 16543584256
ociRuntime:
name: crun
package: crun-1.4.4-1.fc34.x86_64
path: /usr/bin/crun
version: |-
crun version 1.4.4
commit: 6521fcc5806f20f6187eb933f9f45130c86da230
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
remoteSocket:
exists: true
path: /run/user/1000/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.1.12-2.fc34.x86_64
version: |-
slirp4netns version 1.1.12
commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
libslirp: 4.4.0
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.3
swapFree: 16906444800
swapTotal: 16936591360
uptime: 221h 45m 47.69s (Approximately 9.21 days)
plugins:
log:
- k8s-file
- none
- journald
network:
- bridge
- macvlan
volume:
- local
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- docker.io
- quay.io
store:
configFile: /home/mero/.config/containers/storage.conf
containerStore:
number: 42
paused: 0
running: 0
stopped: 42
graphDriverName: overlay
graphOptions: {}
graphRoot: /home/mero/.local/share/containers/storage
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageStore:
number: 4
runRoot: /run/user/1000/containers
volumePath: /home/mero/.local/share/containers/storage/volumes
version:
APIVersion: 3.4.7
Built: 1650562689
BuiltTime: Thu Apr 21 19:38:09 2022
GitCommit: ""
GoVersion: go1.16.15
OsArch: linux/amd64
Version: 3.4.7
What happened?
We are seeing test flakes with the error message mysql.Endpoint() = port not found
. The code we use to create the container is attached below. In particular, we use both wait.ForListeningPort("3306/tcp")
and wait.ForLog("port: 3306")
to make sure we wait until the container is running and available. However, it seems even with that, GenericContainer
will occasionally return without being ready, as evidenced by the fact that Endpoint
returns an error about the port not being found.
Relevant log output
No response
Additional Information
mysql, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
ContainerRequest: testcontainers.ContainerRequest{
Image: "mysql:8",
ExposedPorts: []string{"3306/tcp"},
Env: map[string]string{
"MYSQL_ROOT_PASSWORD": "test",
},
WaitingFor: wait.ForAll(wait.ForListeningPort("3306/tcp"), wait.ForLog("port: 3306")),
SkipReaper: true, // https://github.com/testcontainers/testcontainers-go/issues/604
},
Started: true,
Logger: testcontainers.TestLogger(t),
})
if err != nil {
t.Fatalf("testcontainers.GenericContainer() = %v", err)
}
t.Cleanup(func() {
t.Logf("Stopping container")
d := new(time.Duration)
// TODO: What happens when the context expires? Can we pass a different
// context here, or would that prevent Stop from working?
mysql.Stop(context.Background(), d)
})
ep, err := mysql.Endpoint(ctx, "")
if err != nil {
t.Fatalf("mysql.Endpoint() = %v", err)
}
I am seeing this as well on 0.17.0 with mysql image, test sometimes fails, more often on CI. My configuration is the same as presented except I am using normal Docker and don't skip reaper. time.Sleep(5 * time.Second)
didn't add stability but time.Sleep(30 * time.Second)
does seem to have helped - presumably this is the time of the relatively slow MySQL container itself and the wait condition is getting missed for some reason.
For context, the build starts and terminates a database container about three times, perhaps having multiple start/stop in the same binary run (in this case go test ./...
) causes a race condition?
@anuraaga @awagner-iq thanks for open this issue, and sorry for the radio silence, I went on paternity leave at the beginning of December and probably skipped this ticket.
I'm investigating why it fails for your use case although, with the above snippet, I'm not able to reproduce it yet. Debugging...
I'm noticing this as well while running on Docker 20.10 on Linux - it's resulting in a fair bit of flakiness, especially in our CI pipeline. It doesn't look like this is strictly related to Podman.
It seems to be a bug in the Endpoint function. I'm using docker on linux and I can reproduce this easily on my machine and in ci. Now I create the URL myself for a specific port (not the first one, as Endpoint does it) and I did not see this issue yet.
It seems to be a bug in the Endpoint function. I'm using docker on linux and I can reproduce this easily on my machine and in ci. Now I create the URL myself for a specific port (not the first one, as Endpoint does it) and I did not see this issue yet.
Could you please add a repro snippet, including which version of the library are you using? 🙏 If there are so many users seeing it, we could be in front of a real bug that we should fix 🐞
Yes would be great if this one can go up for priority because it is still occurring , and it is breaking a lot of CI pipelines. This is code snippet:
github.com/testcontainers/testcontainers-go v0.19.0
req := testcontainers.ContainerRequest{
Image: "nats:latest",
Name: natsContainerName,
ExposedPorts: []string{"4222/tcp"},
Networks: networks,
WaitingFor: wait.ForAll(wait.ForLog("Server is ready")),
}
container, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
ContainerRequest: req,
Started: true,
})
if err != nil {
return nil, err
}
natsMappedPort, err := container.MappedPort(ctx, "4222/tcp")
if err != nil {
log.Errorf("Failed to get nats port: %v", err)
return initFailure
}
natsHost, err := container.Host(ctx)
if err != nil {
log.Errorf("Failed to get nats host: %v", err)
return initFailure
}
natsEndpoint, err := container.Endpoint(ctx, "")
if err != nil {
log.Errorf("Failed to get nats endpoint: %v", err) --- error occurs in here
return initFailure
}
Could you please add a repro snippet, including which version of the library are you using? pray If there are so many users seeing it, we could be in front of a real bug that we should fix lady_beetle
Hi, I did not have the time yet to create a repro snippet. The failing code I have is something I cannot share.
I switched all my Endpoint(...)
calls to PortEndpoint(...)
and explicitly specified the port. I did not have any issues since then. So I think the reason is that the first port that gets detected somehow does not work.
What I'd do given some time:
- Improve the "port not found" error to contain the port (first port that is calculated) and the list of available ports.
- Create a test that starts a lot of containers in go funcs (e.g. 10 mongo containers in my case) and then try to get the
Endpoint(...)
. In my case I get the endpoint immediately after creating the container.
I used a GenericContainer with mongo:4.0, and testcontainers 0.19.0, but also switched to the latest version for a quick test and that did not fix the issue.
Maybe this already helps and gives someone the opportunity to reproduce and analyze this bug.
I'm able to reproduce this exact error in a VM using podman. Will take a look after my parental leave
In the mean time, checking the state of the Docker types, I saw this difference when inspecting a container with podman Vs docker:
Steps to reproduce
- Edit docker.go, Ports method, to print out the docker types for the inspected container:
func (c *DockerContainer) Ports(ctx context.Context) (nat.PortMap, error) {
inspect, err := c.inspectContainer(ctx)
if err != nil {
return nil, err
}
+ fmt.Printf(">>> network.settings.ports: %+v\n", inspect.NetworkSettings.Ports)
+ fmt.Printf(">>> config.exposed.ports: %+v\n", inspect.Config.ExposedPorts)
return inspect.NetworkSettings.Ports, nil
}
- Run the following test:
go run gotest.tools/gotestsum --format short-verbose --packages="./..." -- -run "^TestContainerWithHostNetworkOptions_UseExposePortsFromImageConfigs" -timeout 600s -count=1 -v
With Podman (Ubuntu VM)
>>> network.settings.ports: map[80/tcp:[]]
>>> config.exposed.ports: map[80/tcp:{}]
With Docker (on Mac)
>>> network.settings.ports: map[80/tcp:[{HostIP:0.0.0.0 HostPort:57087}]]
>>> config.exposed.ports: map[80/tcp:{}]
So for some reason, the docker types in Podman are inconsistent according Docker 🤷 . Pinging @kiview @cristianrgreco @eddumelendez @HofmeisterAn for awareness while I'm out
Could be related to https://github.com/containers/podman/issues/17780 🤔
I'm seeing an intermittent port not found
error using Docker for Mac, I tried upgrading and am currently on Docker version 26.0.0, build 2ae903e
. I'm exposing it as a CustomizeRequest, e.g.:
func WithPort(port int) testcontainers.CustomizeRequestOption {
return func(req *testcontainers.GenericContainerRequest) {
...
req.ExposedPorts = append(req.ExposedPorts, fmt.Sprintf("%d/tcp", port))
}
}
and a wait strategy that passes:
wait.ForHTTP(endpoint).WithPort(containerPortWithProtocol).WithStartupTimeout(timeout),
If folks have any workarounds or tips that would be helpful.
EDIT: After investigating this issue further, it looks like docker inspect
returns results excluding the exposed port from the network settings, it might be related to https://github.com/moby/moby/issues/42860. A possible workaround would be to retry MappedPort
in Docker.go a few times internally 🤷♂️
Retrying MappedPort
didn't work for me so I ended up calling docker inspect
from the code. Ugly workaround but it works:
func getMappedPorts(containerID string) (nat.PortMap, error) {
cmd := exec.Command("docker", "inspect", "--format", "{{json .NetworkSettings.Ports}}", containerID)
output, err := cmd.CombinedOutput()
if err != nil {
return nil, fmt.Errorf("failed to execute docker inspect: %s, %v", output, err)
}
var ports nat.PortMap
if err = json.Unmarshal(output, &ports); err != nil {
return nil, fmt.Errorf("failed to parse output: %v", err)
}
return ports, nil
}