PoC: Templated Policies for Reduced Memory and eBPF Program Count
This PR introduces a Proof of Concept to address the issues discussed in https://github.com/cilium/tetragon/issues/4191.
This approach attempts to solve the two main problems described in the issue:
- Decrease memory usage for each policy.
- Instead of deploying a new program for each policy, deploy a unique eBPF program that can be shared by multiple policies.
The primary use case is deploying a distinct policy for each K8s workload where the sensors and filters are identical, but the specific values being enforced (e.g., a list of binaries) differ for each workload.
[!WARNING]
- This PR is intended to demonstrate a potential design and start a discussion. It is not intended for a code review.
- Only the significant parts of the logic needed to explain the concept have been implemented. It is not a complete, functioning solution.
- Tests, comprehensive comments, and validation checks are entirely missing.
- The
poc/directory in this branch contains sample YAML files and a README.md to help test and understand this approach.
Ideal Design Explanation
Let's start from the ideal solution we have in mind, and then let's see how this is translated into the POC.
The proposed solution is based on two core concepts: "Templates" and "Bindings".
Template
A "template" is a TracingPolicy that specifies variables which can be populated at runtime, rather than being hardcoded at load time. Selectors within the policy reference these variables by name.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "block-process-template"
spec:
variables:
- name: "targetExecPaths"
type: "linux_binprm" # this could be used for extra validation but it's probably not strictly necessary
kprobes:
- call: "security_bprm_creds_for_exec"
syscall: false
args:
- index: 0
type: "linux_binprm"
selectors:
- matchArgs:
- index: 0
operator: "Equal"
valuesFromVariable: "targetExecPaths"
When a template policy is deployed, it loads the necessary eBPF programs and maps, but it has no runtime effect because it lacks concrete values for its comparisons.
Binding
A "binding" is a new resource (e.g., TracingPolicyBinding) that provides concrete values for a template's variables and applies them to specific workloads.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicyBinding
metadata:
name: "block-process-template-values-1"
spec:
policyTemplateRef:
name: "block-process-template"
podSelector:
matchLabels:
app: "my-app-1"
bindings:
- name: "targetExecPaths"
values:
- "/usr/bin/true"
- "/usr/bin/ls"
The policy logic becomes active only when a TracingPolicyBinding is deployed. This action populates the template's eBPF maps with the specified values for the cgroups matching the podSelector.
POC Implementation
To minimize changes for this POC, we reuse the existing TracingPolicy resource and its OptionSpec to simulate both templates and bindings.
Template: A template is defined as a TracingPolicy using these options:
- name: binding # Ideally "variable", but "binding" is used in the POC
value: "targetExecPaths"
- name: arg-type
value: "linux_binprm"
Binding: A binding is also a TracingPolicy (which would ideally be a TracingPolicyBinding) that references the template and provides values. This POC currently supports only one binding.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "block-process-template-values-1"
spec:
podSelector:
matchLabels:
app: "my-deployment-1"
options:
- name: binding
value: "targetExecPaths"
- name: values
value: "/usr/bin/nmap"
- name: policy-template-ref
value: "block-process-template"
Details
- When the template
TracingPolicyis deployed, the eBPF programs and maps are loaded. - A new
BPF_MAP_TYPE_HASH,cg_to_policy_mapis introduced. It stores a mapping fromcgroupid-> policy_id. This allows us to look up a policy ID from a cgroupid, which is the reverse of the currentpolicy_filter_cgroup_maps(aBPF_MAP_TYPE_HASH_OF_MAPS). - When a "binding" TracingPolicy is deployed:
- It is assigned a new
policy_id. - For all cgroups matching its podSelector, an entry (
cgroupid->policy_id) is added to thecg_to_policy_map. - The binding's main job is to populate this map, thereby activating the template's logic for the targeted cgroups.
- To store the values from the binding, new
BPF_MAP_TYPE_HASH_OF_MAPSare used:pol_str_maps_*. This implementation is very specific to string/charbuf/filename types and the eq/neq operators, but the concept can be extended to other types/operators, more on this later. - These maps are keyed by the
policy_id(obtained fromcg_to_policy_map). - The value is a hash set of strings (the values from the binding), using the same 11-map-size-bucket technique as the existing
string_maps_*.
- It is assigned a new
[!NOTE] A
cgroup_idcan only be associated with onepolicy_id(binding) at a time. A new binding for the same cgroup should either be rejected or overwrite the existing one. For example, bindingcgroup1to bothpolicy_1(values:/bin/ls) andpolicy_5(values:/bin/cat) simultaneously is not logical.
Current Limitations & Hacks
- The value-matching logic is currently limited to:
- matchArgs / matchData filters
- String / charbuf / filename types
- eq / neq operators
- Extending this to other types/operators would require different eBPF maps/approaches. We think that we could also have a v1 with only some operators/types supported but the design of the API and eBPF program should be flexible enough to allow future extensions without breaking changes.
- Same thing for multiple bindings per template, currently only one binding is supported but the design should be extensible to support multiple bindings without API changes. I'm not sure multi-binding support would be really needed in practice for this reason i would avoid complicating the code too much until we have a real use case for it.
- A hack is used to signal the eBPF program to use the new
pol_str_maps_*instead of a hardcoded value: we setvallen=8in theselector_arg_filter. I've to admit i've not verified this approach too much since i think this is not a sustainable solution but just works for the POC.
Summary & Goals
This design provides a path toward achieving the two goals of the issue:
- Single eBPF Program: A single, shared eBPF program can serve
npolicies (e.g., 512-1024 or more), as they all reference the same template. This drastically reduces the number of eBPF programs loaded in the kernel. - Low Memory Overhead: The memory increase for each new policy (binding) is minimal. It's limited to new entries in
cg_to_policy_mapand thepol_str_maps_*(likely a few KB per policy, assuming non-massive value lists).
Thanks!
I've raised a point in the original issue, and I'm not sure if it's addressed here. What happens if the same workload is matched by multiple templates?
I'm guessing the answer is somewhere, and I'm probably missing it. I think the best way to move forward with this is to write a CFP: https://github.com/cilium/design-cfps, so that we can discuss all the design options, the semantics of the CRDs or new primitives we introduce, as well as the implementation options.
See this CFP https://github.com/cilium/design-cfps/pull/80