[LOGS]: logs are unreliable and randomly miss data
Description
Repeatedly running tests will eventually bring failures.
In this case copied below, a simple run echo does successfully print on stdout, but there is nothing in the log file.
--- FAIL: TestLogsAfterRestartingContainer (6.44s)
container_logs_test.go:365:
+============================================================================================================+
| ๐งฝ | "TestLogsAfterRestartingContainer": initial cleanup |
+============================================================================================================+
container_logs_test.go:338:
+------------------------------------------------------------------------------------------------------------+
| โก๏ธ | โ๏ธ /usr/local/bin/nerdctl rm -f testlogsafterrestartingcontainer-a8f1fb79 |
+------------------------------------------------------------------------------------------------------------+
| | ๐ time="2025-05-12T22:04:21-07:00" level=error msg="1 errors:\nno such container: testlogsafter |
| | restartingcontainer-a8f1fb79" |
+------------------------------------------------------------------------------------------------------------+
| ๐ฑ | XDG_SESSION_TYPE=tty |
| | HOME=/home/dmp.linux |
| | XDG_SESSION_CLASS=user |
| | XDG_SESSION_ID=2 |
| | XDG_RUNTIME_DIR=/run/user/501 |
| | PATH=/usr/local/go/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/sbin:/sbin: |
| | /usr/sbin:/sbin:/usr/local/go/bin:/home/dmp.linux/go/bin |
| | NERDCTL_TOML=/tmp/TestLogsAfterRestartingContainer4032231042/003/nerdctl.toml |
| | DOCKER_CONFIG=/tmp/TestLogsAfterRestartingContainer4032231042/003 |
+------------------------------------------------------------------------------------------------------------+
| โฐ | <1s (limit: 3m0s) |
+------------------------------------------------------------------------------------------------------------+
| ๐ | /tmp/TestLogsAfterRestartingContainer4032231042/002 |
+------------------------------------------------------------------------------------------------------------+
container_logs_test.go:365:
+============================================================================================================+
| ๐ | "TestLogsAfterRestartingContainer": setup |
+============================================================================================================+
container_logs_test.go:332:
+------------------------------------------------------------------------------------------------------------+
| โก๏ธ | โ๏ธ /usr/local/bin/nerdctl run --name testlogsafterrestartingcontainer-a8f1fb79 ghcr.io/stargz-co |
| | ntainers/alpine:3.13-org printf 'Hello World! |
| | There is no newline' |
+------------------------------------------------------------------------------------------------------------+
| | ๐ข 'Hello World! |
| | There is no newline' |
+------------------------------------------------------------------------------------------------------------+
| ๐ฑ | XDG_SESSION_TYPE=tty |
| | HOME=/home/dmp.linux |
| | XDG_SESSION_CLASS=user |
| | XDG_SESSION_ID=2 |
| | XDG_RUNTIME_DIR=/run/user/501 |
| | PATH=/usr/local/go/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/sbin:/sbin: |
| | /usr/sbin:/sbin:/usr/local/go/bin:/home/dmp.linux/go/bin |
| | DOCKER_CONFIG=/tmp/TestLogsAfterRestartingContainer4032231042/003 |
| | NERDCTL_TOML=/tmp/TestLogsAfterRestartingContainer4032231042/003/nerdctl.toml |
+------------------------------------------------------------------------------------------------------------+
| โฐ | <1s (limit: 3m0s) |
+------------------------------------------------------------------------------------------------------------+
| ๐ | /tmp/TestLogsAfterRestartingContainer4032231042/002 |
+------------------------------------------------------------------------------------------------------------+
container_logs_test.go:365:
โคต๏ธ๏ธ "TestLogsAfterRestartingContainer": into subtests prep
--- FAIL: TestLogsAfterRestartingContainer/logs_-f_works (0.09s)
container_logs_test.go:365:
+============================================================================================================+
| ๐ | "TestLogsAfterRestartingContainer/logs_-f_works": starting test! |
+============================================================================================================+
| โณ | /tmp/TestLogsAfterRestartingContainerlogs_-f_works3636253693/001/471579cc |
+============================================================================================================+
| | |
+============================================================================================================+
| config | { |
| | "Namespace": "nerdctl-test", |
| | "PrivateMode": "" |
| | } |
+============================================================================================================+
| labels | {} |
+============================================================================================================+
container_logs_test.go:365:
+------------------------------------------------------------------------------------------------------------+
| โก๏ธ | โ๏ธ /usr/local/bin/nerdctl logs -f testlogsafterrestartingcontainer-a8f1fb79 |
+------------------------------------------------------------------------------------------------------------+
| ๐ฑ | XDG_SESSION_TYPE=tty |
| | HOME=/home/dmp.linux |
| | XDG_SESSION_CLASS=user |
| | XDG_SESSION_ID=2 |
| | XDG_RUNTIME_DIR=/run/user/501 |
| | PATH=/usr/local/go/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/sbin:/sbin: |
| | /usr/sbin:/sbin:/usr/local/go/bin:/home/dmp.linux/go/bin |
| | DOCKER_CONFIG=/tmp/TestLogsAfterRestartingContainerlogs_-f_works3636253693/003 |
| | NERDCTL_TOML=/tmp/TestLogsAfterRestartingContainerlogs_-f_works3636253693/003/nerdctl.toml |
+------------------------------------------------------------------------------------------------------------+
| โฐ | <1s (limit: 3m0s) |
+------------------------------------------------------------------------------------------------------------+
| ๐ | /tmp/TestLogsAfterRestartingContainerlogs_-f_works3636253693/002 |
+------------------------------------------------------------------------------------------------------------+
container_logs_test.go:365:
<<<<<<<<<<<<<<<<<<<<
๐๏ธ Inspecting output (contains)
๐ testing: ``
โ FAILED! ~= `Hello World!`
>>>>>>>>>>>>>>>>>>>>
container_logs_test.go:365:
<<<<<<<<<<<<<<<<<<<<
๐๏ธ Inspecting output (contains)
๐ testing: ``
โ FAILED! ~= `There is no newline`
>>>>>>>>>>>>>>>>>>>>
Steps to reproduce the issue
Hammer tests.
Describe the results you received and expected
Not loose data.
What version of nerdctl are you using?
main
Are you using a variant of nerdctl? (e.g., Rancher Desktop)
None
Host information
Logs codebase is intricate, and even with some moderate nerdctl experience, it is not trivial to even grok what is going on.
Are we even responsible for writing logs at any point? Or is it containerd only?
Is there a reason to believe that containerd could delay writing logs after a container has exited?
Are we even responsible for writing logs at any point? Or is it containerd only?
Three main components in log management:
containerd-shimredirects logs fromruncto the containerIO plugin (file, binary, stdio, etc.).nerdctlreads logs from the plugin and processes them.go-logrotatehandles JSON file logs, writing and rotating them on disk.
With all these components, it's hard to ensure the reliability of the entire chain. I think this ticket is not urgent as long as we don't receive feedback from users.
Well, am I not a user? ๐ (a high maintenance one, for sure...)
Well, am I not a user? ๐ (a high maintenance one, for sure...)
I was pointing to the CI, the error was detected after multiple CI retries ^^
Well, am I not a user? ๐ (a high maintenance one, for sure...)
I was pointing to the CI, the error was detected after multiple CI retries ^^
just kidding @fahedouch :-)
Just for the record though: I did hit the error (of missing logs) in normal usage. The repeat-test details are just here for repro.
Back to serious:
I think we should treat tests errors seriously. Especially the flaky ones. Just because we did not hear from users should not mean it is low-priority. We only hear from a small fraction of the issues in the field, as most people cannot be arsed to report. Even more so for elusive / flaky problems, as even more people will refrain from reporting issues that cannot be (easily) reproduced.