plugins icon indicating copy to clipboard operation
plugins copied to clipboard

k8s_audit plugin terminates due to long token

Open rrednoss opened this issue 8 months ago • 0 comments

Describe the bug

I have Falco installed on my Kubernetes cluster via the Helm chart. Occasionally the pods fail due to a problem in the k8s_audit plugin.

Tue Feb 18 15:09:00 2025: Loaded event sources: syscall, k8s_audit
Tue Feb 18 15:09:00 2025: Enabled event sources: k8s_audit, syscall
Tue Feb 18 15:09:00 2025: Opening 'k8s_audit' source with plugin 'k8saudit'
Tue Feb 18 15:09:00 2025: Opening 'syscall' source with Kernel module
Tue Feb 18 15:09:16 2025: An error occurred in an event source, forcing termination...
Syscall event drop monitoring:
   - event drop detected: 0 occurrences
   - num times actions taken: 0
Error: bufio.Scanner: token too long

As I understand the code, Falco reads the audit events (in my case from a file) and uses the Go library bufio to do so. Internally, a slice is used as a buffer. When the buffer is full, it is dynamically resized. However, if no maximum is specified, the buffer can have a maximum length of 65,536. If an audit event is longer than this, the plugin will terminate.

const (
	// MaxScanTokenSize is the maximum size used to buffer a token
	// unless the user provides an explicit buffer with [Scanner.Buffer].
	// The actual maximum token size may be smaller as the buffer
	// may need to include, for instance, a newline.
	MaxScanTokenSize = 64 * 1024

	startBufSize = 4096 // Size of initial allocation for buffer.
)

See: https://cs.opensource.google/go/go/+/refs/tags/go1.24.0:src/bufio/scan.go;l=77

// Is the buffer full? If so, resize.
if s.end == len(s.buf) {
	// Guarantee no overflow in the multiplication below.
	const maxInt = int(^uint(0) >> 1)
	if len(s.buf) >= s.maxTokenSize || len(s.buf) > maxInt/2 {
		s.setErr(ErrTooLong)
		return false
	}
	...
}

See: https://cs.opensource.google/go/go/+/refs/tags/go1.24.0:src/bufio/scan.go;l=196

How to reproduce it

In our audit log file, for example, I find an event with a length of approx. 240,000 characters. In this case, Flux patches the CustomResourceDefinition HelmRelease via the kustomize-controller. The request and response contain a long OpenAPI specification.

The audit policy we use is based on the audit policy used by Amazon EKS.

- level: RequestResponse
  resources:
    - group: "apiextensions.k8s.io"
    ...
  omitStages:
    - "RequestReceived"

Question

Is your recommendation to avoid long log messages in general or is this a possible attack vector? Theoretically, I can stop Falco from sending alerts by creating a single long audit log entry.

Environment

  • Falco Chart version: 4.20.0
  • Kernel: Linux sanitized 6.1.83-4.ph5 1-photon SMP PREEMPT_DYNAMIC Thu Apr 25 07:51:05 UTC 2024 x86_64 GNU/Linux
  • Installation method: Kubernetes

rrednoss avatar Feb 19 '25 10:02 rrednoss