tetragon icon indicating copy to clipboard operation
tetragon copied to clipboard

feat(sensors): support 38+ override on lsm funcs

Open holyspectral opened this issue 4 months ago • 7 comments

Fixes #4204

Description

As mentioned in #4204, this PR makes multiple kprobe sensors to share their fmod_ret programs, as long as they attach to the same function, so we can allow more than 38+ overrides on lsm functions.

(update Dec. 11, 2025) The changes since draft PR

  • Refactored and moved most logic to pkg/sensors/program/
  • Introduced an override ID in both userspace and ebpf to make sure that an override action is only handled by correct override programs.
  • Fixed an issue in override_tasks map's lifecycle.

The changes:

  1. Now override programs are maintained separately from the kprobe sensors. By saying that, it means:
  • override_tasks becomes a shared global map.
  • override programs (both kprobe and fmod_ret) are created and maintained separately. So multiple kprobe sensors can share the same override program.
  1. The pinned map/programs's location will be as below:
/bpffs/tetragon/__override__
/bpffs/tetragon/__override__/kprobe
/bpffs/tetragon/__override__/kprobe/__x64_sys_symlinkat
/bpffs/tetragon/__override__/kprobe/__x64_sys_symlinkat/link_override
/bpffs/tetragon/__override__/kprobe/__x64_sys_symlinkat/prog_override
/bpffs/tetragon/__override__/kprobe/__x64_sys_execve
/bpffs/tetragon/__override__/kprobe/__x64_sys_execve/link_override
/bpffs/tetragon/__override__/kprobe/__x64_sys_execve/prog_override
/bpffs/tetragon/__override__/override_tasks
/bpffs/tetragon/__override__/fmod_ret
/bpffs/tetragon/__override__/fmod_ret/security_bprm_creds_for_exec
/bpffs/tetragon/__override__/fmod_ret/security_bprm_creds_for_exec/prog

Changelog

holyspectral avatar Oct 23 '25 20:10 holyspectral

@olsajiri may I have your feedback on this one? Thanks a lot.

holyspectral avatar Oct 23 '25 20:10 holyspectral

Deploy Preview for tetragon ready!

Name Link
Latest commit 0828a7f5fabb8bb919163674dd85f534cedf55b4
Latest deploy log https://app.netlify.com/projects/tetragon/deploys/6940722f9bb4170008444051
Deploy Preview https://deploy-preview-4244--tetragon.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify[bot] avatar Nov 13 '25 14:11 netlify[bot]

Changes since v1:

  1. Allow kprobe to use shared override programs.
  2. Instead of having one override_tasks map for each hook point, in v2 each override program shares the same override_tasks map. This helps to support multi-kprobe scenario.
  3. The pinned path of ebpf programs and maps will be as below:
/bpffs/tetragon/__override__
/bpffs/tetragon/__override__/kprobe
/bpffs/tetragon/__override__/kprobe/__x64_sys_symlinkat
/bpffs/tetragon/__override__/kprobe/__x64_sys_symlinkat/link_override
/bpffs/tetragon/__override__/kprobe/__x64_sys_symlinkat/prog_override
/bpffs/tetragon/__override__/kprobe/__x64_sys_execve
/bpffs/tetragon/__override__/kprobe/__x64_sys_execve/link_override
/bpffs/tetragon/__override__/kprobe/__x64_sys_execve/prog_override
/bpffs/tetragon/__override__/override_tasks
/bpffs/tetragon/__override__/fmod_ret
/bpffs/tetragon/__override__/fmod_ret/security_bprm_creds_for_exec
/bpffs/tetragon/__override__/fmod_ret/security_bprm_creds_for_exec/prog
  1. Make sure that unused override programs are cleaned up via unloaderOverride().

  2. Move bpf functions regarding overrides into another object file.

holyspectral avatar Nov 13 '25 14:11 holyspectral

I still keep this as draft because of an item that I'd like to discuss first. Say we have a scenario is like the below (keep in mind that in v2 we use a shared override_tasks map):

  • We have two tracing policies A & B.
    • PolicyA is hooked on sys_execve and policyB is on sys_symlinkat.
    • They both have their own override action in matchActions.
  • A process triggers the override action of policy A via sys_execve, and has an item inserted into the override_tasks map.
    • override_tasks[<pid_tgid>] = -EPERM
  • Before the program triggers the override program attached to sys_execve, a policy change on A happens and the override program on sys_execve is removed from another core.
    • the content of the map stays the same, override_tasks[<pid_tgid>] = -EPERM
    • but the override program on sys_execve is removed.
  • When the program triggers policy B via sys_symlinkat later, because override_tasks[<pid_tgid>] = -EPERM, the action will be denied unexpectedly, even if nothing is matched.

I can think of a few directions:

  1. Maybe this is not a real issue, the window is too short, or Tetragon has already handled this, so we don't have to care about it.

  2. Change the override_tasks to be a BPF_MAP_TYPE_HASH_OF_MAPS and let its inner map be associated to each hook points, so we can clean up the content of override_tasks map when we remove an override program. For example,

    override_tasks: {
        "syscall:sys_execve": <inner map like the current override_tasks map>,
        "syscall:sys_symlinkat": <inner map like the current override_tasks map>,
        "fmod_ret:security_bprm_creds_for_exec": <inner map like the current override_tasks map>,
    }
  1. We don't delete the override programs immediately when its policy is removed. This will give the override programs some time to remove the items.

@olsajiri do you think this is a real issue that we should address? I'd love to know your thoughts on this.

holyspectral avatar Nov 13 '25 15:11 holyspectral

I can think of a few directions:

1. Maybe this is not a real issue, the window is too short, or Tetragon has already handled this, so we don't have to care about it.

2. Change the `override_tasks` to be a BPF_MAP_TYPE_HASH_OF_MAPS and let its inner map be associated to each hook points, so we can clean up the content of override_tasks map when we remove an override program.  For example,
    override_tasks: {
        "syscall:sys_execve": <inner map like the current override_tasks map>,
        "syscall:sys_symlinkat": <inner map like the current override_tasks map>,
        "fmod_ret:security_bprm_creds_for_exec": <inner map like the current override_tasks map>,
    }
3. We don't delete the override programs immediately when its policy is removed.  This will give the override programs some time to remove the items.

@olsajiri do you think this is a real issue that we should address? I'd love to know your thoughts on this.

yea, that seems like a problem.. so at the moment override_task is program's map, so each override program has its own copy, I'd suggest to have some state of this change doing the same, and adding a change to single map on top of that

perhaps we could have policy id as part of the override_task value and have sensor unload to cleanup its records before it unloads the override program.. something like you suggest in 2) but not sure what's benefit of inner map

I'll check on your change in more detail but on first glance please try to split the change into more logical changes/commits, it's easier to review, thanks

olsajiri avatar Nov 16 '25 22:11 olsajiri

perhaps we could have policy id as part of the override_task value and have sensor unload to cleanup its records before it unloads the override program.. something like you suggest in 2) but not sure what's benefit of inner map

Thanks. I think that makes sense. Let me see if I can add an extra field to the key or value of override_tasks map.

I'll check on your change in more detail but on first glance please try to split the change into more logical changes/commits, it's easier to review, thanks

I've split them into multiple commits. Please feel free to let me know if anything is not clear!

holyspectral avatar Nov 19 '25 20:11 holyspectral

Hi @olsajiri thanks for your patience. I've updated v3 in this PR and changed this PR as ready for review. The change since last time:

  • Refactored and moved most logic to pkg/sensors/program/
  • Introduced an override ID in both userspace and ebpf to make sure that an override action is only handled by correct override programs.
  • Fixed an issue in override_tasks map's lifecycle.

I'll appreciate any feedback on this.

holyspectral avatar Dec 11 '25 16:12 holyspectral

Hi @olsajiri thanks for your patience. I've updated v3 in this PR and changed this PR as ready for review. The change since last time:

* Refactored and moved most logic to pkg/sensors/program/

* Introduced an override ID in both userspace and ebpf to make sure that an override action is only handled by correct override programs.

* Fixed an issue in override_tasks map's lifecycle.

I'll appreciate any feedback on this.

@holyspectral I'm checking on that, would you mind to rebase it? it'd be easier for me to run it, thanks

olsajiri avatar Dec 15 '25 09:12 olsajiri

Looks like there are issues from vmtests on older kernels. I will take a look.

holyspectral avatar Dec 16 '25 14:12 holyspectral

Could you reorganize your PR commits, squash the things that need to be squashed (lints, back and forth change in the code), separate the others correctly? It's a bit hard to review as of now, I see Jiri proposed some ideas. Also some commits seems misstitled ("kprobe: support 38+ override on lsm funcs"). It should appear in the patch set how we should merge it to the main branch. Thanks a lot!

mtardy avatar Dec 22 '25 15:12 mtardy