nerdctl icon indicating copy to clipboard operation
nerdctl copied to clipboard

[LOGS]: logs are unreliable and randomly miss data

Open apostasie opened this issue 7 months ago โ€ข 5 comments

Description

Repeatedly running tests will eventually bring failures.

In this case copied below, a simple run echo does successfully print on stdout, but there is nothing in the log file.

--- FAIL: TestLogsAfterRestartingContainer (6.44s)
    container_logs_test.go:365:

        +============================================================================================================+
        | ๐Ÿงฝ      | "TestLogsAfterRestartingContainer": initial cleanup                                              |
        +============================================================================================================+

    container_logs_test.go:338:

        +------------------------------------------------------------------------------------------------------------+
        | โžก๏ธ      | โš™๏ธ /usr/local/bin/nerdctl rm -f testlogsafterrestartingcontainer-a8f1fb79                        |
        +------------------------------------------------------------------------------------------------------------+
        |         | ๐ŸŸ  time="2025-05-12T22:04:21-07:00" level=error msg="1 errors:\nno such container: testlogsafter |
        |         | restartingcontainer-a8f1fb79"                                                                    |
        +------------------------------------------------------------------------------------------------------------+
        | ๐ŸŒฑ      | XDG_SESSION_TYPE=tty                                                                             |
        |         | HOME=/home/dmp.linux                                                                             |
        |         | XDG_SESSION_CLASS=user                                                                           |
        |         | XDG_SESSION_ID=2                                                                                 |
        |         | XDG_RUNTIME_DIR=/run/user/501                                                                    |
        |         | PATH=/usr/local/go/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/sbin:/sbin: |
        |         | /usr/sbin:/sbin:/usr/local/go/bin:/home/dmp.linux/go/bin                                         |
        |         | NERDCTL_TOML=/tmp/TestLogsAfterRestartingContainer4032231042/003/nerdctl.toml                    |
        |         | DOCKER_CONFIG=/tmp/TestLogsAfterRestartingContainer4032231042/003                                |
        +------------------------------------------------------------------------------------------------------------+
        | โฐ      | <1s (limit: 3m0s)                                                                                |
        +------------------------------------------------------------------------------------------------------------+
        | ๐Ÿ“      | /tmp/TestLogsAfterRestartingContainer4032231042/002                                              |
        +------------------------------------------------------------------------------------------------------------+

    container_logs_test.go:365:

        +============================================================================================================+
        | ๐Ÿ—       | "TestLogsAfterRestartingContainer": setup                                                        |
        +============================================================================================================+

    container_logs_test.go:332:

        +------------------------------------------------------------------------------------------------------------+
        | โžก๏ธ      | โš™๏ธ /usr/local/bin/nerdctl run --name testlogsafterrestartingcontainer-a8f1fb79 ghcr.io/stargz-co |
        |         | ntainers/alpine:3.13-org printf 'Hello World!                                                    |
        |         | There is no newline'                                                                             |
        +------------------------------------------------------------------------------------------------------------+
        |         | ๐ŸŸข 'Hello World!                                                                                 |
        |         | There is no newline'                                                                             |
        +------------------------------------------------------------------------------------------------------------+
        | ๐ŸŒฑ      | XDG_SESSION_TYPE=tty                                                                             |
        |         | HOME=/home/dmp.linux                                                                             |
        |         | XDG_SESSION_CLASS=user                                                                           |
        |         | XDG_SESSION_ID=2                                                                                 |
        |         | XDG_RUNTIME_DIR=/run/user/501                                                                    |
        |         | PATH=/usr/local/go/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/sbin:/sbin: |
        |         | /usr/sbin:/sbin:/usr/local/go/bin:/home/dmp.linux/go/bin                                         |
        |         | DOCKER_CONFIG=/tmp/TestLogsAfterRestartingContainer4032231042/003                                |
        |         | NERDCTL_TOML=/tmp/TestLogsAfterRestartingContainer4032231042/003/nerdctl.toml                    |
        +------------------------------------------------------------------------------------------------------------+
        | โฐ      | <1s (limit: 3m0s)                                                                                |
        +------------------------------------------------------------------------------------------------------------+
        | ๐Ÿ“      | /tmp/TestLogsAfterRestartingContainer4032231042/002                                              |
        +------------------------------------------------------------------------------------------------------------+

    container_logs_test.go:365:
        โคต๏ธ๏ธ "TestLogsAfterRestartingContainer": into subtests prep
    --- FAIL: TestLogsAfterRestartingContainer/logs_-f_works (0.09s)
        container_logs_test.go:365:

            +============================================================================================================+
            | ๐Ÿš€      | "TestLogsAfterRestartingContainer/logs_-f_works": starting test!                                 |
            +============================================================================================================+
            | โณ      | /tmp/TestLogsAfterRestartingContainerlogs_-f_works3636253693/001/471579cc                        |
            +============================================================================================================+
            |         |                                                                                                  |
            +============================================================================================================+
            | config  | {                                                                                                |
            |         |   "Namespace": "nerdctl-test",                                                                   |
            |         |   "PrivateMode": ""                                                                              |
            |         | }                                                                                                |
            +============================================================================================================+
            | labels  | {}                                                                                               |
            +============================================================================================================+

        container_logs_test.go:365:

            +------------------------------------------------------------------------------------------------------------+
            | โžก๏ธ      | โš™๏ธ /usr/local/bin/nerdctl logs -f testlogsafterrestartingcontainer-a8f1fb79                      |
            +------------------------------------------------------------------------------------------------------------+
            | ๐ŸŒฑ      | XDG_SESSION_TYPE=tty                                                                             |
            |         | HOME=/home/dmp.linux                                                                             |
            |         | XDG_SESSION_CLASS=user                                                                           |
            |         | XDG_SESSION_ID=2                                                                                 |
            |         | XDG_RUNTIME_DIR=/run/user/501                                                                    |
            |         | PATH=/usr/local/go/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/sbin:/sbin: |
            |         | /usr/sbin:/sbin:/usr/local/go/bin:/home/dmp.linux/go/bin                                         |
            |         | DOCKER_CONFIG=/tmp/TestLogsAfterRestartingContainerlogs_-f_works3636253693/003                   |
            |         | NERDCTL_TOML=/tmp/TestLogsAfterRestartingContainerlogs_-f_works3636253693/003/nerdctl.toml       |
            +------------------------------------------------------------------------------------------------------------+
            | โฐ      | <1s (limit: 3m0s)                                                                                |
            +------------------------------------------------------------------------------------------------------------+
            | ๐Ÿ“      | /tmp/TestLogsAfterRestartingContainerlogs_-f_works3636253693/002                                 |
            +------------------------------------------------------------------------------------------------------------+

        container_logs_test.go:365:

            <<<<<<<<<<<<<<<<<<<<
            	๐Ÿ–Š๏ธ Inspecting output (contains)
            	๐Ÿ‘€ testing:		``
            	โŒ FAILED!		~= `Hello World!`
            >>>>>>>>>>>>>>>>>>>>

        container_logs_test.go:365:

            <<<<<<<<<<<<<<<<<<<<
            	๐Ÿ–Š๏ธ Inspecting output (contains)
            	๐Ÿ‘€ testing:		``
            	โŒ FAILED!		~= `There is no newline`
            >>>>>>>>>>>>>>>>>>>>

Steps to reproduce the issue

Hammer tests.

Describe the results you received and expected

Not loose data.

What version of nerdctl are you using?

main

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

Logs codebase is intricate, and even with some moderate nerdctl experience, it is not trivial to even grok what is going on.

Are we even responsible for writing logs at any point? Or is it containerd only?

Is there a reason to believe that containerd could delay writing logs after a container has exited?

apostasie avatar May 13 '25 21:05 apostasie

Are we even responsible for writing logs at any point? Or is it containerd only?

Three main components in log management:

  • containerd-shim redirects logs from runc to the containerIO plugin (file, binary, stdio, etc.).
  • nerdctl reads logs from the plugin and processes them.
  • go-logrotate handles JSON file logs, writing and rotating them on disk.

With all these components, it's hard to ensure the reliability of the entire chain. I think this ticket is not urgent as long as we don't receive feedback from users.

fahedouch avatar May 14 '25 08:05 fahedouch

Well, am I not a user? ๐Ÿ˜ (a high maintenance one, for sure...)

apostasie avatar May 14 '25 15:05 apostasie

Well, am I not a user? ๐Ÿ˜ (a high maintenance one, for sure...)

I was pointing to the CI, the error was detected after multiple CI retries ^^

fahedouch avatar May 14 '25 17:05 fahedouch

Well, am I not a user? ๐Ÿ˜ (a high maintenance one, for sure...)

I was pointing to the CI, the error was detected after multiple CI retries ^^

just kidding @fahedouch :-)

Just for the record though: I did hit the error (of missing logs) in normal usage. The repeat-test details are just here for repro.

apostasie avatar May 14 '25 17:05 apostasie

Back to serious:

I think we should treat tests errors seriously. Especially the flaky ones. Just because we did not hear from users should not mean it is low-priority. We only hear from a small fraction of the issues in the field, as most people cannot be arsed to report. Even more so for elusive / flaky problems, as even more people will refrain from reporting issues that cannot be (easily) reproduced.

apostasie avatar May 14 '25 17:05 apostasie