testcontainers-go
testcontainers-go copied to clipboard
[Bug]: Fail to start Kafka container (KRaft) with basic setup
Testcontainers version
0.31.0
Using the latest Testcontainers version?
Yes
Host OS
Mac
Host arch
ARM
Go version
1.22.3
Docker version
Client: Docker Engine - Community
Version: 24.0.6
API version: 1.43
Go version: go1.21.0
Git commit: ed223bc820
Built: Thu Aug 31 17:24:32 2023
OS/Arch: darwin/arm64
Context: desktop-linux
Server: Docker Desktop 4.30.0 (149282)
Engine:
Version: 26.1.1
API version: 1.45 (minimum version 1.24)
Go version: go1.21.9
Git commit: ac2de55
Built: Tue Apr 30 11:48:04 2024
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.6.31
GitCommit: e377cd56a71523140ca6ae87e30244719194a521
runc:
Version: 1.1.12
GitCommit: v1.1.12-0-g51d5e94
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Docker info
Client: Docker Engine - Community
Version: 24.0.6
Context: desktop-linux
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.14.0-desktop.1
Path: /Users/ronshemer/.docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.27.0-desktop.2
Path: /Users/ronshemer/.docker/cli-plugins/docker-compose
debug: Get a shell into any image or container (Docker Inc.)
Version: 0.0.29
Path: /Users/ronshemer/.docker/cli-plugins/docker-debug
dev: Docker Dev Environments (Docker Inc.)
Version: v0.1.2
Path: /Users/ronshemer/.docker/cli-plugins/docker-dev
extension: Manages Docker extensions (Docker Inc.)
Version: v0.2.23
Path: /Users/ronshemer/.docker/cli-plugins/docker-extension
feedback: Provide feedback, right in your terminal! (Docker Inc.)
Version: v1.0.4
Path: /Users/ronshemer/.docker/cli-plugins/docker-feedback
init: Creates Docker-related starter files for your project (Docker Inc.)
Version: v1.1.0
Path: /Users/ronshemer/.docker/cli-plugins/docker-init
sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
Version: 0.6.0
Path: /Users/ronshemer/.docker/cli-plugins/docker-sbom
scout: Docker Scout (Docker Inc.)
Version: v1.8.0
Path: /Users/ronshemer/.docker/cli-plugins/docker-scout
Server:
Containers: 6
Running: 1
Paused: 0
Stopped: 5
Images: 38
Server Version: 26.1.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e377cd56a71523140ca6ae87e30244719194a521
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
seccomp
Profile: unconfined
cgroupns
Kernel Version: 6.6.26-linuxkit
Operating System: Docker Desktop
OSType: linux
Architecture: aarch64
CPUs: 10
Total Memory: 15.6GiB
Name: docker-desktop
ID: 4c6eebbc-c190-4257-96b7-65293c5d4add
Docker Root Dir: /var/lib/docker
Debug Mode: false
HTTP Proxy: http.docker.internal:3128
HTTPS Proxy: http.docker.internal:3128
No Proxy: hubproxy.docker.internal
Labels:
com.docker.desktop.address=unix:///Users/ronshemer/Library/Containers/com.docker.docker/Data/docker-cli.sock
Experimental: false
Insecure Registries:
hubproxy.docker.internal:5555
127.0.0.0/8
Live Restore Enabled: false
WARNING: daemon is not using the default seccomp profile
What happened?
I followed the documentation on how to use kafka testcontainers and at first it worked randomly (got the container running), and now it does not work constantly ending up with the "port not found" error after waiting the 1m timeout.
Relevant log output
β‘ go run .
2024/05/19 16:40:21 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 26.1.1
API Version: 1.44
Operating System: Docker Desktop
Total Memory: 15976 MB
Resolved Docker Host: unix:///var/run/docker.sock
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: 46c175eaf04deba9d5b7bdc51177a827b389a4d7cec93f329a3bef4e700b2154
Test ProcessID: c8a51e52-8aa0-4514-aa6d-d487aff9d1be
2024/05/19 16:40:21 π³ Creating container for image testcontainers/ryuk:0.7.0
2024/05/19 16:40:21 β
Container created: cad4a0e8592a
2024/05/19 16:40:21 π³ Starting container: cad4a0e8592a
2024/05/19 16:40:21 β
Container started: cad4a0e8592a
2024/05/19 16:40:21 π§ Waiting for container id cad4a0e8592a image: testcontainers/ryuk:0.7.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/05/19 16:40:22 π Container is ready: cad4a0e8592a
2024/05/19 16:40:22 π³ Creating container for image confluentinc/confluent-local:7.5.0
2024/05/19 16:40:22 β
Container created: e1bc25805a7b
2024/05/19 16:40:22 π³ Starting container: e1bc25805a7b
2024/05/19 16:41:22 β
Container started: e1bc25805a7b
2024/05/19 16:41:22 container logs (port not found
context deadline exceeded):
2024/05/19 16:41:22 failed to start container: failed to start container: port not found
context deadline exceeded
exit status 1
Additional information
This is the complete program I am running to reproduce (started an empty project):
package main
import (
"context"
"log"
"github.com/testcontainers/testcontainers-go/modules/kafka"
)
func main() {
ctx := context.Background()
kafkaContainer, err := kafka.RunContainer(ctx,
kafka.WithClusterID("test-cluster"),
)
if err != nil {
log.Fatalf("failed to start container: %s", err)
}
// Clean up the container after
defer func() {
if err := kafkaContainer.Terminate(ctx); err != nil {
log.Fatalf("failed to terminate container: %s", err)
}
}()
}
Here is the mod file
module kafka-test
go 1.22.3
require (
dario.cat/mergo v1.0.0 // indirect
github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 // indirect
github.com/Microsoft/go-winio v0.6.1 // indirect
github.com/Microsoft/hcsshim v0.11.4 // indirect
github.com/cenkalti/backoff/v4 v4.2.1 // indirect
github.com/containerd/containerd v1.7.15 // indirect
github.com/containerd/log v0.1.0 // indirect
github.com/cpuguy83/dockercfg v0.3.1 // indirect
github.com/distribution/reference v0.5.0 // indirect
github.com/docker/docker v25.0.5+incompatible // indirect
github.com/docker/go-connections v0.5.0 // indirect
github.com/docker/go-units v0.5.0 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/go-logr/logr v1.4.1 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-ole/go-ole v1.2.6 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/protobuf v1.5.4 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/klauspost/compress v1.16.7 // indirect
github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect
github.com/magiconair/properties v1.8.7 // indirect
github.com/moby/patternmatcher v0.6.0 // indirect
github.com/moby/sys/sequential v0.5.0 // indirect
github.com/moby/sys/user v0.1.0 // indirect
github.com/moby/term v0.5.0 // indirect
github.com/morikuni/aec v1.0.0 // indirect
github.com/opencontainers/go-digest v1.0.0 // indirect
github.com/opencontainers/image-spec v1.1.0 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
github.com/shirou/gopsutil/v3 v3.23.12 // indirect
github.com/shoenig/go-m1cpu v0.1.6 // indirect
github.com/sirupsen/logrus v1.9.3 // indirect
github.com/testcontainers/testcontainers-go v0.31.0 // indirect
github.com/testcontainers/testcontainers-go/modules/kafka v0.31.0 // indirect
github.com/tklauser/go-sysconf v0.3.12 // indirect
github.com/tklauser/numcpus v0.6.1 // indirect
github.com/yusufpapurcu/wmi v1.2.3 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.49.0 // indirect
go.opentelemetry.io/otel v1.24.0 // indirect
go.opentelemetry.io/otel/metric v1.24.0 // indirect
go.opentelemetry.io/otel/trace v1.24.0 // indirect
golang.org/x/crypto v0.22.0 // indirect
golang.org/x/mod v0.16.0 // indirect
golang.org/x/sys v0.19.0 // indirect
golang.org/x/tools v0.13.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20230731190214-cbb8c96f2d6d // indirect
google.golang.org/grpc v1.58.3 // indirect
google.golang.org/protobuf v1.33.0 // indirect
)
I am also experiencing this issue and have been able to reproduce it. It appears to be related to the PostStartLifecycleHook. The Kafka module uses the PostStart LifecycleHook to get the mapped port, which is used to create the Kafka startup script. However, when PostStart is called, the Docker container has not yet resolved the port mapping, causing the port not found error.
While I am not certain if there is a straightforward solution, I have found a less elegant solution to resolve this issue. Simply adding a sleep at the beginning of the Kafka PostStart hook works fine in my tests, and we can make it configurable as needed.
I will suggest a PR for review.
@mdelapenya, do you know of a way to wait until Docker resolves the port mapping before the PostStart hook?
@wilsouza , with adding a delay, did the problem go away completely for you?
My colleague @grevend is also running into this issue, on a Mac (for Linux I cannot reproduce it). And we both tried adding delays, but also retries, and neither worked consistently.
So we are now wondering if its really the same issue, or slightly different.
If we look into the response from the MappedPort request it's sometimes correct, but other times not having any data in the NetworkSettings.Ports section of the Docker inspect command.
Note we're still debugging further, seeing if we can find something to make it work more consistently, but so far it either works, or not (delays etc having no positive effect).
I experience the same behaviour with the Redpanda module. Also tried older versions of testcontainers with the same result. Recently updated to the latest Docker Desktop.
Thank you for sharing the details of the problems you're encountering.
@wilsouza , with adding a delay, did the problem go away completely for you?
Yes, adding a delay has completely resolved the issue for me. I'm running an internal version with this solution and haven't experienced any problems. Iβve also tested it on my Mac with an M1 processor without issues.
My colleague @grevend is also running into this issue, on a Mac (for Linux I cannot reproduce it). And we both tried adding delays, but also retries, and neither worked consistently.
So we are now wondering if its really the same issue, or slightly different. If we look into the response from the
MappedPortrequest it's sometimes correct, but other times not having any data in theNetworkSettings.Portssection of the Dockerinspectcommand.Note we're still debugging further, seeing if we can find something to make it work more consistently, but so far it either works, or not (delays etc having no positive effect).
Have you tried running version #2552 with a custom wait time? Could you try running it with a custom timeout to see if it resolves the issue on your end?
@wilsouza we tried with waits, unfortunately though, the wait needs to be pretty long to be stable (3-5 seconds). What makes that unacceptable from my perspective, is that the wait isn't needed on Linux based systems. Even if for a second, it would add a second to every test run, for every time the Kafka container needs to be started.
We also further looked into why some retry mechanism doesn't work and found the cause. MappedPort calls the Inspect function, which always caches the first result it got:
https://github.com/testcontainers/testcontainers-go/blob/d4a21ea92ee84c058c3ad189aa328b9f5229807e/docker.go#L182
https://github.com/testcontainers/testcontainers-go/blob/d4a21ea92ee84c058c3ad189aa328b9f5229807e/docker.go#L169
Which is then the reason why a retry doesn't work (if it once gets a wrong response, it always gets the wrong response), and a initial delay somehow works as it gives Docker a chance to set up the port mapping.
I'm happy to provide assistance to any solution, but would really like to hear @mdelapenya 's perspective. Because I'm wondering if caching that Inspect response is the right thing to do. As I guess most fields would be stable, but next to the network there might be some other fields that change such as the State ones? Always caching would give a high potential of returning wrong info.
And of course the question remains why the port mapping has some delay in the first place for MacOS, but maybe that's due to the nature of how Docker works there.
And of course the question remains why the port mapping has some delay in the first place for MacOS, but maybe that's due to the nature of how Docker works there.
Maybe it relates to the fact MacOS uses a Virtual Machine to host docker containers π€
Maybe the port is mapped in the VM, but not in the MacOS host machine.
For the record, I have this behaviour also on a linux machine running ubuntu:
Server Version: 26.1.4
API Version: 1.44
Operating System: Ubuntu 22.04.4 LTS
Total Memory: 7771 MB
Resolved Docker Host: unix:///var/run/docker.sock
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: 109ffb6f0068199f8347d3954fd3f1fb31f80a5ffc016cd4ea231e7a3193510d
Test ProcessID: 5dcbcb80-8058-4b5a-a12b-37d4d47fd291
[...]
kafka_test.go:13:
Error Trace: [...]/kafka_test.go:13
Error: Received unexpected error:
failed to start container: context deadline exceeded
any update on this?
Got hit with this bug today on my Mac. Happens randomly. I'm using the latest testcontainers version.
I think that this issue can be closed now that it has been addressed in @mdelapenya PR, #2606. What do you think @mdelapenya? Iβm unable to validate if itβs still occurring at this time.
This issue seem to be resolved for me when upgraded to v0.32.0. I will close this now, thanks to everyone that was involved, this is much appreciated!
thanks folks for your support! I'm currently at Gophercon, and I will go back to regular work next Monday π