podman
podman copied to clipboard
Long k8s-file logs are truncated on 'podman logs --follow' output of stopped container
Issue Description
When a container using the k8s-file log driver outputs a lot of logs, retrieving them with podman logs --follow is unreliable. If you're lucky, the entire log file is output successfully. If you're not lucky, the log is truncated at some unspecified place depending on the speed of your hardware.
Retrieving logs without --follow is unaffected. Retrieving logs from a running container is unaffected.
Steps to reproduce the issue
Steps to reproduce the issue
- Run a container that outputs a lot of logs with the k8s-file log driver. Allow it to stop completely.
- Run
podman logs --followon the container
I've provided a reproducer at https://gist.github.com/UniversalSuperBox/031f34ecb51c5bd04cb559542d9ba519. You can run 1test.sh and have it run podman logs until it fails, or just run podman logs --follow --latest | tail -n 1:
core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
scelerisque semper tortor. Sed commodo libero id posuere facilisis. Cras
core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
torquent per conubia nostra, per inceptos himenaeos. Nullam elementum
core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
Morbi id elit ultrices, pulvinar odio sit amet, elementum lectus.
core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
leo ac rutrum ullamcorper. Phasellus ut ultricies ligula. Phasellus
Describe the results you received
The length of the log differs on every run of podman logs --follow
Describe the results you expected
The entire log is output.
podman info output
host:
arch: arm64
buildahVersion: 1.33.5
cgroupControllers:
- cpu
- io
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.10-1.fc39.aarch64
path: /usr/bin/conmon
version: 'conmon version 2.1.10, commit: '
cpuUtilization:
idlePercent: 98.99
systemPercent: 0.45
userPercent: 0.56
cpus: 8
databaseBackend: boltdb
distribution:
distribution: fedora
variant: coreos
version: "39"
eventLogger: journald
freeLocks: 2017
hostname: localhost.localdomain
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 1000000
uidmap:
- container_id: 0
host_id: 502
size: 1
- container_id: 1
host_id: 100000
size: 1000000
kernel: 6.7.5-200.fc39.aarch64
linkmode: dynamic
logDriver: journald
memFree: 2676797440
memTotal: 4084744192
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns-1.10.0-1.fc39.aarch64
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.10.0
package: netavark-1.10.3-1.fc39.aarch64
path: /usr/libexec/podman/netavark
version: netavark 1.10.3
ociRuntime:
name: crun
package: crun-1.14.3-1.fc39.aarch64
path: /usr/bin/crun
version: |-
crun version 1.14.3
commit: 1961d211ba98f532ea52d2e80f4c20359f241a98
rundir: /run/user/502/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
os: linux
pasta:
executable: /usr/bin/pasta
package: passt-0^20240220.g1e6f92b-1.fc39.aarch64
version: |
pasta 0^20240220.g1e6f92b-1.fc39.aarch64-pasta
Copyright Red Hat
GNU General Public License, version 2 or later
<https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
remoteSocket:
exists: true
path: /run/user/502/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.2.2-1.fc39.aarch64
version: |-
slirp4netns version 1.2.2
commit: 0ee2d87523e906518d34a6b423271e4826f71faf
libslirp: 4.7.0
SLIRP_CONFIG_VERSION_MAX: 4
libseccomp: 2.5.3
swapFree: 0
swapTotal: 0
uptime: 0h 50m 30.00s
variant: v8
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- docker.io
store:
configFile: /var/home/core/.config/containers/storage.conf
containerStore:
number: 5
paused: 0
running: 0
stopped: 5
graphDriverName: overlay
graphOptions: {}
graphRoot: /var/home/core/.local/share/containers/storage
graphRootAllocated: 106769133568
graphRootUsed: 13598543872
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "true"
Supports d_type: "true"
Supports shifting: "false"
Supports volatile: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 62
runRoot: /run/user/502/containers
transientStore: false
volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
APIVersion: 4.9.3
Built: 1708357248
BuiltTime: Mon Feb 19 09:40:48 2024
GitCommit: ""
GoVersion: go1.21.7
Os: linux
OsArch: linux/arm64
Version: 4.9.3
Podman in a container
No
Privileged Or Rootless
None
Upstream Latest Release
Yes
Additional environment details
No response
Additional information
The issue occurs regardless of rootfulness.
This issue occurs whenever the log size gets above a certain point, that point seems dependent on hardware. Seems to be a race condition where the stop point is based on a certain time spent in the log reader, not an amount of data output. The issue occurs over the API, too, so you can use curl with the --limit-rate command to test that. When the rate is smaller, the amount of text downloaded is also smaller.
Command for flavor, overwrites binary-output2.bin in the current directory:
curl \
--output 'binary-output2.bin' \
--no-buffer \
-XGET \
--limit-rate=1000K
--unix-socket "/run/user/${uid}/podman/podman.sock" \
"http://localhost/v5.0.0/libpod/containers/${container}/logs?follow=True&stderr=True&stdout=True"
A friendly reminder that this issue had no activity for 30 days.
I confirm this observation. Observed on podman 5.1.1 running on Arch Linux x86_64.
@UniversalSuperBox Thanks for the script, I managed to hit it after 1000 iterations.
Looking at our code I cannot see anything in particular wrong but we use github.com/nxadm/tail for the tailing functionality and there is https://github.com/nxadm/tail/issues/67 reported so I assume this is a bug over there.
Actually running more tests it seem to be releated to https://github.com/nxadm/tail/issues/37 I think
To a lot longer than it should have been but https://github.com/nxadm/tail/pull/71 should fix the issue. Locally I am at 40000 iterations now without seeing the bug. Without the patch I was hitting it after 100-1000 iterations
Hopefully nxadm/tail is still active, last commit was 8 months ago.
Is it time yet to fork nxadm/tail? Unreliable tail breaks automation scripts left and right.