testcontainers-go
testcontainers-go copied to clipboard
[Bug]: Container logging exits prematurely with error
Testcontainers version
0.29.1
Using the latest Testcontainers version?
Yes
Host OS
Linux
Host arch
x86
Go version
1.22
Docker version
(NOTE: using podman, not docker)
Version: 3.4.4
API Version: 3.4.4
Go Version: go1.18.1
Built: Wed Dec 31 19:00:00 1969
OS/Arch: linux/amd64
Docker info
host:
arch: amd64
buildahVersion: 1.23.1
cgroupControllers:
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: 'conmon: /usr/bin/conmon'
path: /usr/bin/conmon
version: 'conmon version 2.0.25, commit: unknown'
cpus: 32
distribution:
codename: jammy
distribution: ubuntu
version: "22.04"
eventLogger: journald
hostname: noah-workstation
idMappings:
gidmap:
- container_id: 0
host_id: 2109978069
size: 1
- container_id: 1
host_id: 10000
size: 65536
uidmap:
- container_id: 0
host_id: 2109978069
size: 1
- container_id: 1
host_id: 10000
size: 65536
kernel: 6.1.0-1033-oem
linkmode: dynamic
logDriver: journald
memFree: 101372432384
memTotal: 134160068608
ociRuntime:
name: crun
package: 'crun: /usr/bin/crun'
path: /usr/bin/crun
version: |-
crun version 0.17
commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
remoteSocket:
exists: true
path: /run/user/2109978069/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: 'slirp4netns: /usr/bin/slirp4netns'
version: |-
slirp4netns version 1.0.1
commit: 6a7b16babc95b6a3056b33fb45b74a6f62262dd4
libslirp: 4.6.1
swapFree: 8589930496
swapTotal: 8589930496
uptime: 363h 0m 53.36s (Approximately 15.12 days)
plugins:
log:
- k8s-file
- none
- journald
network:
- bridge
- macvlan
volume:
- local
registries: {}
store:
configFile: /home/noah/.config/containers/storage.conf
containerStore:
number: 4
paused: 0
running: 0
stopped: 4
graphDriverName: overlay
graphOptions: {}
graphRoot: /home/noah/.local/share/containers/storage
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageStore:
number: 25
runRoot: /run/user/2109978069/containers
volumePath: /home/noah/.local/share/containers/storage/volumes
version:
APIVersion: 3.4.4
Built: 0
BuiltTime: Wed Dec 31 19:00:00 1969
GitCommit: ""
GoVersion: go1.18.1
OsArch: linux/amd64
Version: 3.4.4
What happened?
I've found that LogConsumers sometimes stop receiving logs prematurely and dump an error.
I've built a minimal reproducible example of the specific issue. It consists of a very small container image based on distroless/base. The container executes the following program:
package main
import (
"log"
"time"
)
func main() {
i := 0
for {
log.Printf("tick %d", i)
i += 1
time.Sleep(1 * time.Second)
}
}
I've attached a prebuilt container to the issue as a tarball.
In order to reproduce the issue, first load the container tarball like so:
gunzip container.tar.gz
podman load -i container.tar
And then compile and run the following test:
package container_test
import (
"context"
"fmt"
"testing"
"time"
tc "github.com/testcontainers/testcontainers-go"
)
type StdoutLogConsumer struct{}
func (lc *StdoutLogConsumer) Accept(l tc.Log) {
fmt.Printf("%s", l.Content)
}
func TestLogIssue(t *testing.T) {
req := tc.ContainerRequest{
Image: "localhost/log_repro:latest",
LogConsumerCfg: &tc.LogConsumerConfig{
Consumers: []tc.LogConsumer{&StdoutLogConsumer{}},
},
}
ctx := context.Background()
container, err := tc.GenericContainer(ctx, tc.GenericContainerRequest{
ContainerRequest: req,
Started: true,
})
if err != nil {
t.Fatal(err)
}
t.Cleanup(func() {
if err := container.Terminate(ctx); err != nil {
t.Fatal(err)
}
})
time.Sleep(10 * time.Second)
fmt.Println("test over")
}
I'd expect this test to print container logs showing 9-10 ticks and exit cleanly. Instead, it consistently stops displaying output after 6 ticks, and always outputs an error message:
2024/03/06 19:21:03 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 3.4.4
API Version: 1.40
Operating System: ubuntu
Total Memory: 127945 MB
Resolved Docker Host: unix:///run/user/2109978069/podman/podman.sock
Resolved Docker Socket Path: /run/user/2109978069/podman/podman.sock
Test SessionID: c7ce1db5c519f0188cc6943c86f8faccf62034e6b4d4c1ab5bf6f8ff896ecf3b
Test ProcessID: 36d03253-4e13-441e-8149-88e46fb4fa15
2024/03/06 19:21:03 🐳 Creating container for image testcontainers/ryuk:0.6.0
2024/03/06 19:21:03 ✅ Container created: 15f55739196d
2024/03/06 19:21:03 🐳 Starting container: 15f55739196d
2024/03/06 19:21:03 ✅ Container started: 15f55739196d
2024/03/06 19:21:03 🚧 Waiting for container id 15f55739196d image: testcontainers/ryuk:0.6.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/03/06 19:21:03 🔔 Container is ready: 15f55739196d
2024/03/06 19:21:03 🐳 Creating container for image localhost/log_repro:latest
2024/03/06 19:21:03 ✅ Container created: f8583fd38a61
2024/03/06 19:21:03 🐳 Starting container: f8583fd38a61
2024/03/06 19:21:03 ✅ Container started: f8583fd38a61
2024/03/06 19:21:03 🔔 Container is ready: f8583fd38a61
2024/03/06 19:21:03 tick 0
2024/03/06 19:21:04 tick 1
2024/03/06 19:21:05 tick 2
2024/03/06 19:21:06 tick 3
2024/03/06 19:21:07 tick 4
2024/03/06 19:21:08 tick 5
container log error: EOF. Stopping log consumer: Headers out of sync2024/03/06 19:21:13 🐳 Terminating container: f8583fd38a61
test over
2024/03/06 19:21:14 🚫 Container terminated: f8583fd38a61
PASS
I did notice some interesting things about this while trying to debug, but I'm not sure what to make of them. In the main loop of docker.startLogProduction(), it seems like the "use of closed network connection" error block consistently triggers before the EOF error is triggered. In addition, it seems like attempting to restart the connection when EOF is received allows the log consumer to get a few more logs, e.g.:
//_, _ = fmt.Fprintf(os.Stderr, "container log error: %+v. %s", err, logStoppedForOutOfSyncMessage)
//// if we would continue here, the next header-read will result into random data...
//return
now := time.Now()
since = fmt.Sprintf("%d.%09d", now.Unix(), int64(now.Nanosecond()))
goto BEGIN
Relevant log output
No response
Additional information
As an update, I've noticed that the amount of time the log consumer works for is the same as the "log production timeout" setting (e.g. by default, the log consumer stops working after five seconds).
Is this what the log production timeout setting is intended to control? It's not super clear to me from the docs.
Your repo tar is no longer present, but #2346 might fix this issue as it refactors how logging start / stop is handled eliminating a race and subsequent deadlock. If you could test @nmoroze that would be great.
Hi @stevenh, thanks for pointing me to your PR. Unfortunately I don't think it solves my particular problem, the issue persists even using your branch.
I've re-attached the container tarball for repro to this comment in case you or anyone else is interested.
I tried to repo locally but can't, here my archive: log_repo.tgz
# build
(cd container; docker build -t localhost/log_repro:latest .)
# run test
go test -v -race
Output:
=== RUN TestLogIssue
2024/03/13 18:28:51 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 25.0.3
API Version: 1.44
Operating System: Docker Desktop
Total Memory: 15836 MB
Resolved Docker Host: unix:///var/run/docker.sock
Resolved Docker Socket Path: /var/run/docker.sock
Test SessionID: ce94f6e5446cf4dac898cbe3506dbc726ab8896b36336393c1055fccee365497
Test ProcessID: a04b6431-4468-45ed-8c50-ab58d412f0cf
2024/03/13 18:28:51 🐳 Creating container for image testcontainers/ryuk:0.6.0
2024/03/13 18:28:51 ✅ Container created: b7c0742de4e8
2024/03/13 18:28:51 🐳 Starting container: b7c0742de4e8
2024/03/13 18:28:52 ✅ Container started: b7c0742de4e8
2024/03/13 18:28:52 🚧 Waiting for container id b7c0742de4e8 image: testcontainers/ryuk:0.6.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/03/13 18:28:52 🔔 Container is ready: b7c0742de4e8
2024/03/13 18:28:52 🐳 Creating container for image localhost/log_repro:latest
2024/03/13 18:28:52 ✅ Container created: 839d3fd30990
2024/03/13 18:28:52 🐳 Starting container: 839d3fd30990
2024/03/13 18:28:53 ✅ Container started: 839d3fd30990
2024/03/13 18:28:53 🔔 Container is ready: 839d3fd30990
2024/03/13 18:28:53 tick 0
2024/03/13 18:28:54 tick 1
2024/03/13 18:28:55 tick 2
2024/03/13 18:28:56 tick 3
2024/03/13 18:28:57 tick 4
2024/03/13 18:28:58 tick 5
2024/03/13 18:28:59 tick 6
2024/03/13 18:29:00 tick 7
2024/03/13 18:29:01 tick 8
2024/03/13 18:29:02 tick 9
2024/03/13 18:29:03 tick 10
test over
2024/03/13 18:29:03 🐳 Terminating container: 839d3fd30990
2024/03/13 18:29:03 🚫 Container terminated: 839d3fd30990
--- PASS: TestLogIssue (11.80s)
PASS
ok log-test 12.844s
I tried adding ProviderType: tc.ProviderPodman, to the request but still seems to want to use docker under WSL.
Tried getting a podman working but its just way to flaky, vm totally hangs and then just wont work for me on WSL:
API forwarding for Docker API clients is not available due to the following startup failures.
could not start api proxy since expected pipe is not available: podman-machine-default
Finally got podman working after a kernel upgrade but still can't reproduce:
=== RUN TestLogIssue
2024/03/13 20:22:37 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 4.9.3
API Version: 1.41
Operating System: fedora
Total Memory: 15836 MB
Resolved Docker Host: unix:///run/user/1000/podman/podman.sock
Resolved Docker Socket Path: /run/user/1000/podman/podman.sock
Test SessionID: c63aa4e8af35d970a47b07194bf27bbbaaac5dfb929678b458c1e515fc310db4
Test ProcessID: ab2fd2cd-f65f-4e99-9e3a-958ccd19d80b
2024/03/13 20:22:37 🐳 Creating container for image testcontainers/ryuk:0.6.0
2024/03/13 20:22:37 ✅ Container created: 1db0a21708f0
2024/03/13 20:22:37 🐳 Starting container: 1db0a21708f0
2024/03/13 20:22:38 ✅ Container started: 1db0a21708f0
2024/03/13 20:22:38 🚧 Waiting for container id 1db0a21708f0 image: testcontainers/ryuk:0.6.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/03/13 20:22:38 🔔 Container is ready: 1db0a21708f0
2024/03/13 20:22:38 🐳 Creating container for image localhost/log_repro:latest
2024/03/13 20:22:38 ✅ Container created: 4e83a135fa90
2024/03/13 20:22:38 🐳 Starting container: 4e83a135fa90
2024/03/13 20:22:38 ✅ Container started: 4e83a135fa90
2024/03/13 20:22:38 🔔 Container is ready: 4e83a135fa90
2024/03/13 20:22:38 tick 0
2024/03/13 20:22:39 tick 1
2024/03/13 20:22:40 tick 2
2024/03/13 20:22:41 tick 3
2024/03/13 20:22:42 tick 4
2024/03/13 20:22:44 tick 6
2024/03/13 20:22:45 tick 7
2024/03/13 20:22:46 tick 8
2024/03/13 20:22:47 tick 9
test over
2024/03/13 20:22:48 🐳 Terminating container: 4e83a135fa90
2024/03/13 20:22:50 🚫 Container terminated: 4e83a135fa90
--- PASS: TestLogIssue (12.54s)
PASS
ok log-test 12.546s
Thanks for the effort to reproduce this. I noticed you're using a newer version of Podman based on your latest logs, I'll try that out on my end and see if it helps (the reason for the old version I'm using is it's the default available on apt for Ubuntu 22.04).
@stevenh Looks like my reproducible example works fine with Podman v4.9.3 both with and without your change.
Unfortunately, after updating, the codebase where I originally found this bug exhibits different buggy behavior. Using testcontainers 0.29.1, it appears to hang indefinitely trying to terminate a container. Using your branch, the Terminate() calls to several containers seem to timeout with this error:
Get "http://%2Frun%2Fuser%2F2109978069%2Fpodman%2Fpodman.sock/v1.41/containers/cc50050f7d744f2d2ae77d0df598f087f1b74b06e2a66dfd537dd12b289b745d/logs?follow=1&since=1710436901.565258953&stderr=1&stdout=1&tail=": context deadline exceeded
I can see if I can come up with a reproducible example for this new issue.
Alright, I was able to reproduce this with a pretty simple example. I've attached a new image tar, this container runs a binary that does nothing and simply busy-loops forever.
The test code itself is also fairly minimal:
package container_test
import (
"context"
"fmt"
"testing"
tc "github.com/testcontainers/testcontainers-go"
)
type StdoutLogConsumer struct{}
func (lc *StdoutLogConsumer) Accept(l tc.Log) {
fmt.Printf("%s", l.Content)
}
func TestLogIssue(t *testing.T) {
req := tc.ContainerRequest{
Image: "localhost/log_repro:latest",
LogConsumerCfg: &tc.LogConsumerConfig{
Consumers: []tc.LogConsumer{&StdoutLogConsumer{}},
},
}
ctx := context.Background()
container, err := tc.GenericContainer(ctx, tc.GenericContainerRequest{
ContainerRequest: req,
Started: true,
})
if err != nil {
t.Fatal(err)
}
t.Cleanup(func() {
if err := container.Terminate(ctx); err != nil {
t.Fatal(err)
}
})
fmt.Println("test over")
}
The output I get:
2024/03/15 20:47:46 github.com/testcontainers/testcontainers-go - Connected to docker:
Server Version: 4.9.3
API Version: 1.41
Operating System: ubuntu
Total Memory: 127946 MB
Resolved Docker Host: unix:///run/user/2109978069/podman/podman.sock
Resolved Docker Socket Path: /run/user/2109978069/podman/podman.sock
Test SessionID: 61f8be8f5fb4a6f37c09cd82a69ea05022e3a674641b2add09d5c5593b17f0c2
Test ProcessID: 9ffc22d9-7015-4555-be43-c7ebd115a846
2024/03/15 20:47:46 🐳 Creating container for image testcontainers/ryuk:0.6.0
2024/03/15 20:47:46 ✅ Container created: 1c9f5ce3ad48
2024/03/15 20:47:46 🐳 Starting container: 1c9f5ce3ad48
2024/03/15 20:47:46 ✅ Container started: 1c9f5ce3ad48
2024/03/15 20:47:46 🚧 Waiting for container id 1c9f5ce3ad48 image: testcontainers/ryuk:0.6.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/03/15 20:47:46 🔔 Container is ready: 1c9f5ce3ad48
2024/03/15 20:47:46 🐳 Creating container for image localhost/log_repro:latest
2024/03/15 20:47:46 ✅ Container created: b3b72d3709fe
2024/03/15 20:47:46 🐳 Starting container: b3b72d3709fe
2024/03/15 20:47:46 ✅ Container started: b3b72d3709fe
2024/03/15 20:47:46 🔔 Container is ready: b3b72d3709fe
test over
2024/03/15 20:47:46 🐳 Terminating container: b3b72d3709fe
--- FAIL: TestLogIssue (6.78s)
container_test.go:36: Get "http://%2Frun%2Fuser%2F2109978069%2Fpodman%2Fpodman.sock/v1.41/containers/b3b72d3709fed603ed1ddc15159307171130ac77a9981ee86610cbde2e05970f/logs?follow=1&stderr=1&stdout=1&tail=": context deadline exceeded
FAIL
Can you post the build for the container so we don't have to rely on images?
@stevenh Sure - I just wrote up a build file and simple Makefile to run everything. I've attached the sources as a tarball, untar and run make to reproduce.
Thanks @nmoroze looks like a bug in podman to me, if the container doesn't generate any log entries the request for log entries just hangs with podman where as docker returns an empty log.
I would suggest raising an issue with the podman team.
Thanks for taking a look! That makes sense, I'll close this issue for now and follow up on that side.