tetragon
tetragon copied to clipboard
LSM sensor
LSM sensor support allows to use LSM BPF programs the way we use BPF programs for kprobes/tracepoints/uprobes.
TracingPolicy example:
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "lsm"
spec:
lsmhooks:
- hook: "file_open"
args:
- index: 0
type: "file"
selectors:
- matchBinaries:
- operator: "In"
values:
- "/usr/bin/cat"
Event example (It still has problems with args printing. I need to solve it):
{
"process_lsm": {
"process": {
"exec_id": "dXNlci1uaXg6MTk5MzE1NDk0NTk3MzM6MzIwMTA4",
"pid": 320108,
"uid": 1000,
"cwd": "/home/user/go/src/github.com/cilium/tetragon",
"binary": "/usr/bin/cat",
"arguments": "/etc/passwd",
"flags": "execve clone",
"start_time": "2024-06-15T18:03:29.742161520Z",
"auid": 1000,
"parent_exec_id": "dXNlci1uaXg6ODg1NjMwMDAwMDAwMDoxNDc4MTI=",
"refcnt": 1,
"tid": 320108,
"user": {
"name": "user"
}
},
"parent": {
"exec_id": "dXNlci1uaXg6ODg1NjMwMDAwMDAwMDoxNDc4MTI=",
"pid": 147812,
"uid": 1000,
"cwd": "/home/user/go/src/github.com/cilium/tetragon",
"binary": "/usr/bin/zsh",
"flags": "procFS auid",
"start_time": "2024-06-15T14:37:33.597296165Z",
"auid": 1000,
"parent_exec_id": "dXNlci1uaXg6MTM1ODA0MDAwMDAwMDozMTQ2",
"tid": 147812
},
"function_name": "file_open",
"policy_name": "lsm",
"args": [
{
"file_arg": {
"path":"/etc/passwd",
"permission":"-rw-r--r--"
}
}
],
"action": "KPROBE_ACTION_POST"
},
"node_name": "user-nix",
"time": "2024-06-15T18:03:29.743030933Z"
}
This is also necessary for #2409.
Deploy Preview for tetragon ready!
| Name | Link |
|---|---|
| Latest commit | 72fc13393481e9d253fc7c176597632dc8aea153 |
| Latest deploy log | https://app.netlify.com/sites/tetragon/deploys/6694f2df98256300088c6cef |
| Deploy Preview | https://deploy-preview-2566--tetragon.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
There are things to do:
- Fix tests
- Add lsm test
- Decide minimal kernel version for LSM sensor support (LSM BPF is needed 5.7 kernel).
- Fix some bugs in code and make it more neat.
I managed to load LSM BPF programs, tail calls also work for LSM programs! The most terrifying problems are solved, I think. I managed to catch some Lsm events).
For now I managed to fix problem with args resolving. LSM BPF programs get args the way raw tracepoint programs do. So, LSM events now are looking good. It would be nice to start code review.
Some open questions:
- ./verify/verify.sh . I made a hack for loading generic LSM programs. But I need to install llvm-objcopy in CI
- Tests. I think we need some tests. Maybe more than one. I'll work on it. Maybe we discuss what kind of tests it's good to have?
- Enforce mode. If LSM BPF returns the value that is not qual to zero, than operation is not permitted. Maybe we should add an action for that? For example, if policy is violated just return -EPERM, instead of sending SIGKILL.
@kkourt, @olsajiri, @mtardy please could you have a look?
@olsajiri I added you to the reviewers. It would be great if you could have a look when you get a chance. Thanks!
Some problems with tests:
- Verification of
bpf_generic_lsm_v61.ofails. I need to investigate why (help is appreciated). - Tetragon Go Test / build (ubuntu-20.04): LsmFileOpen test is failing, but locally it works fine.
Verification of bpf_generic_lsm_v61.o fails. I need to investigate why (help is appreciated).
What is the verification error?
Tetragon Go Test / build (ubuntu-20.04): LsmFileOpen test is failing, but locally it works fine. --- FAIL: TestLSMOpenFile (6.85s)
Had a quick look, but couldn't figure out what the issue is. I can have a closer look next week. Here are some first notes:
Looking at the GH CI logs:
logcapture.go:25: time="2024-07-05T10:53:43Z" level=info msg="Loaded generic LSM program: /home/runner/work/tetragon/tetragon/go/src/github.com/cilium/tetragon/bpf/objs/bpf_generic_lsm_v511.o -> file_open" logcapture.go:25: time="2024-07-05T10:53:43Z" level=info msg="BPF detected features: override_return: true, buildid: true, kprobe_multi: false, uprobe_multi false, fmodret: true, fmodret_syscall: true, signal: true, large: true, lsm: true"
So everything seems to work OK so far.
Logs also say:
jsonchecker.go:183: jsonTestCheck: opening: /tmp/tetragon.gotest.TestLSMOpenFile.3243682509.json
Downloading the artifacts (https://github.com/cilium/tetragon/actions/runs/9807021872?pr=2566), we can find the file there. Looking at the file, there are no lsm events there:
$ jq '. | del(.time) | del(.node_name) | to_entries[] | .key' < tetragon.gotest.TestLSMOpenFile.3243682509.json
"process_exec"
"process_exec"
"process_exit"
What is the verification error?
You can find it in ARM tests. I fixed it locally, but arm ci verification step still failing...
Had a quick look, but couldn't figure out what the issue is.
Yes, thank you for helping me). It is a little bit weird because locally, it is in logs. Vmtests are working fine also.
@kkourt I found out why verification is failed.
https://github.com/cilium/tetragon/blob/c7799feacf315d18d91064d3256c0c28efccb8c8/bpf/process/bpf_process_event.h#L251-L259.
So, the code with loop(...) fails the verification. If I force use the code part with for, everything works fine... May be the problem somehow related with error missing btf func_info and
https://github.com/cilium/tetragon/blob/c7799feacf315d18d91064d3256c0c28efccb8c8/bpf/process/bpf_process_event.h#L227-L232
@olsajiri I added you to the reviewers. It would be great if you could have a look when you get a chance. Thanks!
ugh, sry missed this one, will check
hi, IMO this needs to be split into multiple patches, it's hard to review 6000+ lines diff
from quick look this can be separated to smaller logical changes, like: api changes pkg/sensors changes pkg/bpf changes .github changes bpf changes install changes ... plz split anything that makes sense
Yes, sure
@olsajiri, I splitted code to logical commits and fix other comments, please, have a look.
What about tests: I tested on Ubuntu 20.04
Linux lsm-test 5.15.0-1067-azure #76~20.04.1-Ubuntu SMP Thu Jun 13 18:00:23 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Tests are passing. For now I don't have a clue why CI is failing. As we found out, there is no LSM event in CI test. I can try to change policy and look if we have LSM events without any filtering. @kkourt do you have any ideas about this problem?
What about tests: I tested on Ubuntu 20.04
Linux lsm-test 5.15.0-1067-azure #76~20.04.1-Ubuntu SMP Thu Jun 13 18:00:23 UTC 2024 x86_64 x86_64 x86_64 GNU/LinuxTests are passing. For now I don't have a clue why CI is failing. As we found out, there is no LSM event in CI test. I can try to change policy and look if we have LSM events without any filtering. @kkourt do you have any ideas about this problem?
The failure in https://github.com/cilium/tetragon/actions/runs/9894669687/job/27332734013?pr=2566 says:
observer_test_helper.go:447: LoadConfig error: failed prog /home/runner/actions-runner/_work/tetragon/tetragon/go/src/github.com/cilium/tetragon/bpf/objs/bpf_generic_lsm_v61.o kern_version 393562 loadInstance: attaching 'generic_lsm_event' failed: create raw tracepoint: not supported
Which might explain what's wrong? I'd suggest we probe for above functionality and skip the test if functionality is missing.
What about tests: I tested on Ubuntu 20.04
Linux lsm-test 5.15.0-1067-azure #76~20.04.1-Ubuntu SMP Thu Jun 13 18:00:23 UTC 2024 x86_64 x86_64 x86_64 GNU/LinuxTests are passing. For now I don't have a clue why CI is failing. As we found out, there is no LSM event in CI test. I can try to change policy and look if we have LSM events without any filtering. @kkourt do you have any ideas about this problem?
The failure in https://github.com/cilium/tetragon/actions/runs/9894669687/job/27332734013?pr=2566 says:
observer_test_helper.go:447: LoadConfig error: failed prog /home/runner/actions-runner/_work/tetragon/tetragon/go/src/github.com/cilium/tetragon/bpf/objs/bpf_generic_lsm_v61.o kern_version 393562 loadInstance: attaching 'generic_lsm_event' failed: create raw tracepoint: not supportedWhich might explain what's wrong? I'd suggest we probe for above functionality and skip the test if functionality is missing.
Yes, you are right. It is another problem on ARM machines. The problem I mentioned earlier, we have solved already.
I'll have a look at how to check if raw tracepoints are available, and put this check in bpf.HasLSMPrograms().
It seems to me, that test are passing and fixed all comments. @olsajiri @kkourt , please, could you have a look one more time:)?
Merging this, thanks! Great work!