operator icon indicating copy to clipboard operation
operator copied to clipboard

VLAgent: k8sCollector errors out if it cannot read logs

Open syphernl opened this issue 2 weeks ago • 7 comments

Describe the bug

I am running VL in Minikube. I enabled k8sCollector in VLAgent so it can read the cluster logs and forward them to VL. This however results in a fatal error preventing VLAgent from running.

The entries in /var/logs/containers are symlinks to entries in /var/lib/docker/containers, but the VLAgent pod mounts both /var/log and /var/lib as hostPath so /var/lib/docker/containers should be available.

Either way: VLAgent should not crash if it cannot read particular logfiles, it should log this error and continue.

To Reproduce

  • Run VLAgent in Minikube, deployed via the Operator
  • See VLAgent crashes because it cannot read the logs

Version

victoria-logs-v1.41.0

Logs

2025-12-15T15:28:03.283851594Z 2025-12-15T15:28:03.283Z info    VictoriaLogs/app/vlagent/kubernetescollector/kubernetes.go:54   started Kubernetes log collector for node "victoria"
2025-12-15T15:28:03.283855344Z 2025-12-15T15:28:03.283Z info    VictoriaLogs/app/vlagent/main.go:58     started vlagent in 0.006 seconds
2025-12-15T15:28:03.284461524Z 2025-12-15T15:28:03.284Z info    [email protected]/lib/httpserver/httpserver.go:145       started server at http://0.0.0.0:9429/
2025-12-15T15:28:03.284475452Z 2025-12-15T15:28:03.284Z info    [email protected]/lib/httpserver/httpserver.go:147       pprof handlers are exposed at http://0.0.0.0:9429/debug/pprof/
2025-12-15T15:28:03.294485902Z 2025-12-15T15:28:03.294Z panic   VictoriaLogs/app/vlagent/kubernetescollector/file_collector.go:293      FATAL: cannot open file "/var/log/containers/vmsingle-victoriametrics-75f6848954-zkg9h_vm_vmsingle-8d668e79ac7bcc5161834257c1955162d2b5241993c5df6be990f054a387114f.log": open /var/log/containers/vmsingle-victoriametrics-75f6848954-zkg9h_vm_vmsingle-8d668e79ac7bcc5161834257c1955162d2b5241993c5df6be990f054a387114f.log: permission denied
2025-12-15T15:28:03.296616997Z panic: FATAL: cannot open file "/var/log/containers/vmsingle-victoriametrics-75f6848954-zkg9h_vm_vmsingle-8d668e79ac7bcc5161834257c1955162d2b5241993c5df6be990f054a387114f.log": open /var/log/containers/vmsingle-victoriametrics-75f6848954-zkg9h_vm_vmsingle-8d668e79ac7bcc5161834257c1955162d2b5241993c5df6be990f054a387114f.log: permission denied
2025-12-15T15:28:03.296629328Z 
2025-12-15T15:28:03.296632001Z goroutine 78 [running]:
2025-12-15T15:28:03.296633795Z github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logMessage({0xa8beca, 0x5}, {0xc0001fc420, 0x150}, 0x4)
2025-12-15T15:28:03.296635463Z  github.com/VictoriaMetrics/[email protected]/lib/logger/logger.go:315 +0xa71
2025-12-15T15:28:03.296638493Z github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logLevelSkipframes(0x1, {0xa8beca, 0x5}, {0xa9f0b2?, 0xc00008cc68?}, {0xc00008ccb8?, 0x68?, 0xa6a2c0?})
2025-12-15T15:28:03.296641210Z  github.com/VictoriaMetrics/[email protected]/lib/logger/logger.go:155 +0x1a5
2025-12-15T15:28:03.296653386Z github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logLevel(...)
2025-12-15T15:28:03.296661291Z  github.com/VictoriaMetrics/[email protected]/lib/logger/logger.go:147
2025-12-15T15:28:03.296664286Z github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.Panicf(...)
2025-12-15T15:28:03.296678367Z  github.com/VictoriaMetrics/[email protected]/lib/logger/logger.go:143
2025-12-15T15:28:03.296681263Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector.openFileWithInode({0xc0002aa510, 0x8e})
2025-12-15T15:28:03.296684168Z  github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector/file_collector.go:293 +0x111
2025-12-15T15:28:03.296686886Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector.(*logFile).tryReopen(0xc00008cf58)
2025-12-15T15:28:03.296689680Z  github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector/logfile.go:284 +0x26
2025-12-15T15:28:03.296697383Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector.(*logFile).readLines(0xc00008cf58, 0xc0000b78f0, {0xbbcd50, 0xc0001cff10})
2025-12-15T15:28:03.296700382Z  github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector/logfile.go:82 +0x5a
2025-12-15T15:28:03.296708391Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector.(*fileCollector).process(0xc0001b8fa0, 0xc00008cf58)
2025-12-15T15:28:03.296720522Z  github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector/file_collector.go:114 +0x176
2025-12-15T15:28:03.296729108Z github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector.(*fileCollector).startRead.func1()
2025-12-15T15:28:03.296732306Z  github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector/file_collector.go:96 +0x165
2025-12-15T15:28:03.296739246Z created by github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector.(*fileCollector).startRead in goroutine 35
2025-12-15T15:28:03.296742088Z  github.com/VictoriaMetrics/VictoriaLogs/app/vlagent/kubernetescollector/file_collector.go:81 +0x137

Screenshots

No response

Used command-line flags

No response

Additional information

Deployed via operator (built CRD + image locally) as there is no Operator version released that supports k8sCollector in VLAgent.

syphernl avatar Dec 15 '25 15:12 syphernl

Hey @syphernl, could you please check if the victoria-logs-collector helm chart works for you?

To run the chart with vlagent, you should set the undocumented native flag to true. Example of values.yaml:

remoteWrite:
  - url: http://vlogs-host:9428

native: true
image:
  tag: v1.41.0

If it works for you, it means there is an issue with your mounts.

vadimalekseev avatar Dec 15 '25 15:12 vadimalekseev

Hi @vadimalekseev, thanks for your quick reply.

I have tested the VLC chart and can confirm that it can access the logs of the Minikube node just fine and forwards them to VL.

This is pretty odd, because this uses the very same volumes as a deployed VLAgent instance via the operator.

This is my VLAgent config:

apiVersion: operator.victoriametrics.com/v1
kind: VLAgent
metadata:
  name: primary
  namespace: monitoring
spec:
  useStrictSecurity: true
  image:
    repository: victoriametrics/vlagent
    tag: v1.41.0
    pullPolicy: Always

  remoteWriteSettings:
    maxDiskUsagePerURL: "1GiB"

  remoteWrite:
    - url: "http://vlsingle-victorialogs.vm.svc:9428"

  storage:
    volumeClaimTemplate:
      spec:
        resources:
          requests:
            storage: 5Gi

  # Not yet released in operator
  k8sCollector:
    enabled: true
    extraFields: '{"env":"dev","cluster":"victoria"}'
    msgFields:
      - msg
      - message
      - log.msg
    timeFields:
      - time
      - ts
      - timestamp

These are the used values for the VLC chart:

remoteWrite:
  - url: "http://vlsingle-victorialogs.vm.svc:9428"

native: true
image:
  tag: v1.41.0

syphernl avatar Dec 16 '25 07:12 syphernl

@AndrewChubatiuk should we transfer this issue to https://github.com/VictoriaMetrics/operator ?

vadimalekseev avatar Dec 16 '25 08:12 vadimalekseev

VLAgent should not crash if it cannot read particular logfiles, it should log this error and continue.

I agree with that - vlagent crashing is not a reasonable way to respond to possibly intermittent read errors.

vrutkovs avatar Dec 16 '25 12:12 vrutkovs

@syphernl agree it should not fail, but with useStrictSecurity: true vlagent has no access to logs on host, so none of them are expected to be collected

AndrewChubatiuk avatar Dec 16 '25 12:12 AndrewChubatiuk

@syphernl agree it should not fail, but with useStrictSecurity: true vlagent has no access to logs on host, so none of them are expected to be collected

@AndrewChubatiuk That explains why it doesn't work via the Operator but works fine via the Helm Chart as the chart doesn't have that setting.

The docs state the following:

UseStrictSecurity enables strict security mode for component it restricts disk writes access uses non-root user out of the box drops not needed security permissions

In this case it is probably the fact it drops to non-root and the files are root-owned (and/or has no permissions for VLAgent to read these logs)? Is there any recipe for being able to use this flag while keeping the possibility of reading the host logs as well?

syphernl avatar Dec 16 '25 12:12 syphernl

VLAgent should not crash if it cannot read particular logfiles, it should log this error and continue.

vlagent does not shut down if an error is expected. For example, if a file behind a symlink does not exist, vlagent handles this case correctly. But when the error relates to permissions, the same error will occur when accessing other files, because the same application running under the same user is responsible for the creation of those files. So I don't think we need to avoid failing in this case.

vadimalekseev avatar Dec 16 '25 12:12 vadimalekseev