tetragon icon indicating copy to clipboard operation
tetragon copied to clipboard

PoC: Templated Policies for Reduced Memory and eBPF Program Count

Open Andreagit97 opened this issue 1 month ago • 2 comments

This PR introduces a Proof of Concept to address the issues discussed in https://github.com/cilium/tetragon/issues/4191.
This approach attempts to solve the two main problems described in the issue:

  1. Decrease memory usage for each policy.
  2. Instead of deploying a new program for each policy, deploy a unique eBPF program that can be shared by multiple policies.

The primary use case is deploying a distinct policy for each K8s workload where the sensors and filters are identical, but the specific values being enforced (e.g., a list of binaries) differ for each workload.

[!WARNING]

  • This PR is intended to demonstrate a potential design and start a discussion. It is not intended for a code review.
  • Only the significant parts of the logic needed to explain the concept have been implemented. It is not a complete, functioning solution.
  • Tests, comprehensive comments, and validation checks are entirely missing.
  • The poc/ directory in this branch contains sample YAML files and a README.md to help test and understand this approach.

Ideal Design Explanation

Let's start from the ideal solution we have in mind, and then let's see how this is translated into the POC.
The proposed solution is based on two core concepts: "Templates" and "Bindings".

Template

A "template" is a TracingPolicy that specifies variables which can be populated at runtime, rather than being hardcoded at load time. Selectors within the policy reference these variables by name.

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "block-process-template"
spec:
  variables:
  - name: "targetExecPaths"
    type: "linux_binprm" # this could be used for extra validation but it's probably not strictly necessary
  kprobes:
  - call: "security_bprm_creds_for_exec"
    syscall: false
    args:
    - index: 0
      type: "linux_binprm"
    selectors:
    - matchArgs:
      - index: 0
        operator: "Equal"
        valuesFromVariable: "targetExecPaths"

When a template policy is deployed, it loads the necessary eBPF programs and maps, but it has no runtime effect because it lacks concrete values for its comparisons.

Binding

A "binding" is a new resource (e.g., TracingPolicyBinding) that provides concrete values for a template's variables and applies them to specific workloads.

apiVersion: cilium.io/v1alpha1
kind: TracingPolicyBinding
metadata:
  name: "block-process-template-values-1"
spec:
  policyTemplateRef:
    name: "block-process-template"
  podSelector:
    matchLabels:
      app: "my-app-1"
  bindings:
  - name: "targetExecPaths"
    values:
    - "/usr/bin/true"
    - "/usr/bin/ls"

The policy logic becomes active only when a TracingPolicyBinding is deployed. This action populates the template's eBPF maps with the specified values for the cgroups matching the podSelector.

POC Implementation

To minimize changes for this POC, we reuse the existing TracingPolicy resource and its OptionSpec to simulate both templates and bindings.
Template: A template is defined as a TracingPolicy using these options:

  - name: binding # Ideally "variable", but "binding" is used in the POC
    value: "targetExecPaths"
  - name: arg-type
    value: "linux_binprm"

Binding: A binding is also a TracingPolicy (which would ideally be a TracingPolicyBinding) that references the template and provides values. This POC currently supports only one binding.

apiVersion: cilium.io/v1alpha1  
kind: TracingPolicy  
metadata:  
  name: "block-process-template-values-1"  
spec:  
  podSelector:  
    matchLabels:  
      app: "my-deployment-1"  
  options:  
  - name: binding  
    value: "targetExecPaths"  
  - name: values  
    value: "/usr/bin/nmap"  
  - name: policy-template-ref  
    value: "block-process-template"

Details

  • When the template TracingPolicy is deployed, the eBPF programs and maps are loaded.
  • A new BPF_MAP_TYPE_HASH, cg_to_policy_map is introduced. It stores a mapping from cgroupid-> policy_id. This allows us to look up a policy ID from a cgroupid, which is the reverse of the current policy_filter_cgroup_maps (a BPF_MAP_TYPE_HASH_OF_MAPS).
  • When a "binding" TracingPolicy is deployed:
    • It is assigned a new policy_id.
    • For all cgroups matching its podSelector, an entry (cgroupid->policy_id) is added to the cg_to_policy_map.
    • The binding's main job is to populate this map, thereby activating the template's logic for the targeted cgroups.
    • To store the values from the binding, new BPF_MAP_TYPE_HASH_OF_MAPS are used: pol_str_maps_*. This implementation is very specific to string/charbuf/filename types and the eq/neq operators, but the concept can be extended to other types/operators, more on this later.
    • These maps are keyed by the policy_id (obtained from cg_to_policy_map).
    • The value is a hash set of strings (the values from the binding), using the same 11-map-size-bucket technique as the existing string_maps_*.

[!NOTE] A cgroup_id can only be associated with one policy_id (binding) at a time. A new binding for the same cgroup should either be rejected or overwrite the existing one. For example, binding cgroup1 to both policy_1 (values: /bin/ls) and policy_5 (values: /bin/cat) simultaneously is not logical.

Current Limitations & Hacks

  • The value-matching logic is currently limited to:
    • matchArgs / matchData filters
    • String / charbuf / filename types
    • eq / neq operators
  • Extending this to other types/operators would require different eBPF maps/approaches. We think that we could also have a v1 with only some operators/types supported but the design of the API and eBPF program should be flexible enough to allow future extensions without breaking changes.
  • Same thing for multiple bindings per template, currently only one binding is supported but the design should be extensible to support multiple bindings without API changes. I'm not sure multi-binding support would be really needed in practice for this reason i would avoid complicating the code too much until we have a real use case for it.
  • A hack is used to signal the eBPF program to use the new pol_str_maps_* instead of a hardcoded value: we set vallen=8 in the selector_arg_filter. I've to admit i've not verified this approach too much since i think this is not a sustainable solution but just works for the POC.

Summary & Goals

This design provides a path toward achieving the two goals of the issue:

  1. Single eBPF Program: A single, shared eBPF program can serve n policies (e.g., 512-1024 or more), as they all reference the same template. This drastically reduces the number of eBPF programs loaded in the kernel.
  2. Low Memory Overhead: The memory increase for each new policy (binding) is minimal. It's limited to new entries in cg_to_policy_map and the pol_str_maps_* (likely a few KB per policy, assuming non-massive value lists).

Andreagit97 avatar Oct 31 '25 12:10 Andreagit97

Thanks!

I've raised a point in the original issue, and I'm not sure if it's addressed here. What happens if the same workload is matched by multiple templates?

I'm guessing the answer is somewhere, and I'm probably missing it. I think the best way to move forward with this is to write a CFP: https://github.com/cilium/design-cfps, so that we can discuss all the design options, the semantics of the CRDs or new primitives we introduce, as well as the implementation options.

kkourt avatar Nov 03 '25 08:11 kkourt

See this CFP https://github.com/cilium/design-cfps/pull/80

mtardy avatar Nov 10 '25 17:11 mtardy