VLAgent: k8sCollector errors out if it cannot read logs
Describe the bug
I am running VL in Minikube.
I enabled k8sCollector in VLAgent so it can read the cluster logs and forward them to VL.
This however results in a fatal error preventing VLAgent from running.
The entries in /var/logs/containers are symlinks to entries in /var/lib/docker/containers, but the VLAgent pod mounts both /var/log and /var/lib as hostPath so /var/lib/docker/containers should be available.
Either way: VLAgent should not crash if it cannot read particular logfiles, it should log this error and continue.
To Reproduce
- Run VLAgent in Minikube, deployed via the Operator
- See VLAgent crashes because it cannot read the logs
Version
victoria-logs-v1.41.0
Logs
2025-12-15T15:28:03.283851594Z 2025-12-15T15:28:03.283Z info VictoriaLogs/app/vlagent/kubernetescollector/kubernetes.go:54 started Kubernetes log collector for node "victoria"
2025-12-15T15:28:03.283855344Z 2025-12-15T15:28:03.283Z info VictoriaLogs/app/vlagent/main.go:58 started vlagent in 0.006 seconds
2025-12-15T15:28:03.284461524Z 2025-12-15T15:28:03.284Z info [email protected]/lib/httpserver/httpserver.go:145 started server at http://0.0.0.0:9429/
2025-12-15T15:28:03.284475452Z 2025-12-15T15:28:03.284Z info [email protected]/lib/httpserver/httpserver.go:147 pprof handlers are exposed at http://0.0.0.0:9429/debug/pprof/
2025-12-15T15:28:03.294485902Z 2025-12-15T15:28:03.294Z panic VictoriaLogs/app/vlagent/kubernetescollector/file_collector.go:293 FATAL: cannot open file "/var/log/containers/vmsingle-victoriametrics-75f6848954-zkg9h_vm_vmsingle-8d668e79ac7bcc5161834257c1955162d2b5241993c5df6be990f054a387114f.log": open /var/log/containers/vmsingle-victoriametrics-75f6848954-zkg9h_vm_vmsingle-8d668e79ac7bcc5161834257c1955162d2b5241993c5df6be990f054a387114f.log: permission denied
2025-12-15T15:28:03.296616997Z panic: FATAL: cannot open file "/var/log/containers/vmsingle-victoriametrics-75f6848954-zkg9h_vm_vmsingle-8d668e79ac7bcc5161834257c1955162d2b5241993c5df6be990f054a387114f.log": open /var/log/containers/vmsingle-victoriametrics-75f6848954-zkg9h_vm_vmsingle-8d668e79ac7bcc5161834257c1955162d2b5241993c5df6be990f054a387114f.log: permission denied
2025-12-15T15:28:03.296629328Z
2025-12-15T15:28:03.296632001Z goroutine 78 [running]:
2025-12-15T15:28:03.296633795Z github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logMessage({0xa8beca, 0x5}, {0xc0001fc420, 0x150}, 0x4)
2025-12-15T15:28:03.296635463Z github.com/VictoriaMetrics/[email protected]/lib/logger/logger.go:315 +0xa71
2025-12-15T15:28:03.296638493Z github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logLevelSkipframes(0x1, {0xa8beca, 0x5}, {0xa9f0b2?, 0xc00008cc68?}, {0xc00008ccb8?, 0x68?, 0xa6a2c0?})
2025-12-15T15:28:03.296641210Z github.com/VictoriaMetrics/[email protected]/lib/logger/logger.go:155 +0x1a5
2025-12-15T15:28:03.296653386Z github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logLevel(...)
2025-12-15T15:28:03.296661291Z github.com/VictoriaMetrics/[email protected]/lib/logger/logger.go:147
2025-12-15T15:28:03.296664286Z github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.Panicf(...)
2025-12-15T15:28:03.296678367Z github.com/VictoriaMetrics/[email protected]/lib/logger/logger.go:143
2025-12-15T15:28:03.296681263Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector.openFileWithInode({0xc0002aa510, 0x8e})
2025-12-15T15:28:03.296684168Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector/file_collector.go:293 +0x111
2025-12-15T15:28:03.296686886Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector.(*logFile).tryReopen(0xc00008cf58)
2025-12-15T15:28:03.296689680Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector/logfile.go:284 +0x26
2025-12-15T15:28:03.296697383Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector.(*logFile).readLines(0xc00008cf58, 0xc0000b78f0, {0xbbcd50, 0xc0001cff10})
2025-12-15T15:28:03.296700382Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector/logfile.go:82 +0x5a
2025-12-15T15:28:03.296708391Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector.(*fileCollector).process(0xc0001b8fa0, 0xc00008cf58)
2025-12-15T15:28:03.296720522Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector/file_collector.go:114 +0x176
2025-12-15T15:28:03.296729108Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector.(*fileCollector).startRead.func1()
2025-12-15T15:28:03.296732306Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector/file_collector.go:96 +0x165
2025-12-15T15:28:03.296739246Z created by github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector.(*fileCollector).startRead in goroutine 35
2025-12-15T15:28:03.296742088Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector/file_collector.go:81 +0x137
Screenshots
No response
Used command-line flags
No response
Additional information
Deployed via operator (built CRD + image locally) as there is no Operator version released that supports k8sCollector in VLAgent.
Hey @syphernl, could you please check if the victoria-logs-collector helm chart works for you?
To run the chart with vlagent, you should set the undocumented native flag to true. Example of values.yaml:
remoteWrite:
- url: http://vlogs-host:9428
native: true
image:
tag: v1.41.0
If it works for you, it means there is an issue with your mounts.
Hi @vadimalekseev, thanks for your quick reply.
I have tested the VLC chart and can confirm that it can access the logs of the Minikube node just fine and forwards them to VL.
This is pretty odd, because this uses the very same volumes as a deployed VLAgent instance via the operator.
This is my VLAgent config:
apiVersion: operator.victoriametrics.com/v1
kind: VLAgent
metadata:
name: primary
namespace: monitoring
spec:
useStrictSecurity: true
image:
repository: victoriametrics/vlagent
tag: v1.41.0
pullPolicy: Always
remoteWriteSettings:
maxDiskUsagePerURL: "1GiB"
remoteWrite:
- url: "http://vlsingle-victorialogs.vm.svc:9428"
storage:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 5Gi
# Not yet released in operator
k8sCollector:
enabled: true
extraFields: '{"env":"dev","cluster":"victoria"}'
msgFields:
- msg
- message
- log.msg
timeFields:
- time
- ts
- timestamp
These are the used values for the VLC chart:
remoteWrite:
- url: "http://vlsingle-victorialogs.vm.svc:9428"
native: true
image:
tag: v1.41.0
@AndrewChubatiuk should we transfer this issue to https://github.com/VictoriaMetrics/operator ?
VLAgent should not crash if it cannot read particular logfiles, it should log this error and continue.
I agree with that - vlagent crashing is not a reasonable way to respond to possibly intermittent read errors.
@syphernl
agree it should not fail, but with useStrictSecurity: true vlagent has no access to logs on host, so none of them are expected to be collected
@syphernl agree it should not fail, but with
useStrictSecurity: truevlagent has no access to logs on host, so none of them are expected to be collected
@AndrewChubatiuk That explains why it doesn't work via the Operator but works fine via the Helm Chart as the chart doesn't have that setting.
The docs state the following:
UseStrictSecurityenables strict security mode for component it restricts disk writes access uses non-root user out of the box drops not needed security permissions
In this case it is probably the fact it drops to non-root and the files are root-owned (and/or has no permissions for VLAgent to read these logs)? Is there any recipe for being able to use this flag while keeping the possibility of reading the host logs as well?
VLAgent should not crash if it cannot read particular logfiles, it should log this error and continue.
vlagent does not shut down if an error is expected. For example, if a file behind a symlink does not exist, vlagent handles this case correctly. But when the error relates to permissions, the same error will occur when accessing other files, because the same application running under the same user is responsible for the creation of those files. So I don't think we need to avoid failing in this case.