tetragon icon indicating copy to clipboard operation
tetragon copied to clipboard

LSM sensor

Open anfedotoff opened this issue 1 year ago • 3 comments

LSM sensor support allows to use LSM BPF programs the way we use BPF programs for kprobes/tracepoints/uprobes.

TracingPolicy example:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "lsm"
spec:
  lsmhooks:
  - hook: "file_open"
    args:
      - index: 0
        type: "file"
    selectors:
    - matchBinaries:
      - operator: "In"
        values:
        - "/usr/bin/cat"

Event example (It still has problems with args printing. I need to solve it):

{                                                                                                     
  "process_lsm": {                                                                                    
    "process": {                                                                                      
      "exec_id": "dXNlci1uaXg6MTk5MzE1NDk0NTk3MzM6MzIwMTA4",                                  
      "pid": 320108,                                                                                  
      "uid": 1000,                                                                                    
      "cwd": "/home/user/go/src/github.com/cilium/tetragon",                                    
      "binary": "/usr/bin/cat",                                                                       
      "arguments": "/etc/passwd",                                                                     
      "flags": "execve clone",                                                                        
      "start_time": "2024-06-15T18:03:29.742161520Z",                                                                                                                                                       
      "auid": 1000,                                                                                   
      "parent_exec_id": "dXNlci1uaXg6ODg1NjMwMDAwMDAwMDoxNDc4MTI=",                                                                                                                                 
      "refcnt": 1,                                                                                    
      "tid": 320108,                                                                                  
      "user": {                                                                                       
        "name": "user"                                                                          
      }                                                                                               
    },                                                                                                
    "parent": {                                                                                       
      "exec_id": "dXNlci1uaXg6ODg1NjMwMDAwMDAwMDoxNDc4MTI=",                                  
      "pid": 147812,                                                                                  
      "uid": 1000,                                                                                    
      "cwd": "/home/user/go/src/github.com/cilium/tetragon",                                                                                                                                          
      "binary": "/usr/bin/zsh",                                                                                                                                                                             
      "flags": "procFS auid",                                                                         
      "start_time": "2024-06-15T14:37:33.597296165Z",
      "auid": 1000,
      "parent_exec_id": "dXNlci1uaXg6MTM1ODA0MDAwMDAwMDozMTQ2",
      "tid": 147812
    },
    "function_name": "file_open",
    "policy_name": "lsm",
    "args": [
      {
        "file_arg": {
          "path":"/etc/passwd",
          "permission":"-rw-r--r--"
        }
      }
    ],
    "action": "KPROBE_ACTION_POST"
  },
  "node_name": "user-nix",
  "time": "2024-06-15T18:03:29.743030933Z"
}

This is also necessary for #2409.

anfedotoff avatar Jun 16 '24 09:06 anfedotoff

Deploy Preview for tetragon ready!

Name Link
Latest commit 72fc13393481e9d253fc7c176597632dc8aea153
Latest deploy log https://app.netlify.com/sites/tetragon/deploys/6694f2df98256300088c6cef
Deploy Preview https://deploy-preview-2566--tetragon.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Jun 16 '24 09:06 netlify[bot]

There are things to do:

  • Fix tests
  • Add lsm test
  • Decide minimal kernel version for LSM sensor support (LSM BPF is needed 5.7 kernel).
  • Fix some bugs in code and make it more neat.

I managed to load LSM BPF programs, tail calls also work for LSM programs! The most terrifying problems are solved, I think. I managed to catch some Lsm events).

anfedotoff avatar Jun 16 '24 09:06 anfedotoff

For now I managed to fix problem with args resolving. LSM BPF programs get args the way raw tracepoint programs do. So, LSM events now are looking good. It would be nice to start code review.

Some open questions:

  • ./verify/verify.sh . I made a hack for loading generic LSM programs. But I need to install llvm-objcopy in CI
  • Tests. I think we need some tests. Maybe more than one. I'll work on it. Maybe we discuss what kind of tests it's good to have?
  • Enforce mode. If LSM BPF returns the value that is not qual to zero, than operation is not permitted. Maybe we should add an action for that? For example, if policy is violated just return -EPERM, instead of sending SIGKILL.

@kkourt, @olsajiri, @mtardy please could you have a look?

anfedotoff avatar Jun 26 '24 13:06 anfedotoff

@olsajiri I added you to the reviewers. It would be great if you could have a look when you get a chance. Thanks!

kkourt avatar Jul 03 '24 10:07 kkourt

Some problems with tests:

  • Verification of bpf_generic_lsm_v61.o fails. I need to investigate why (help is appreciated).
  • Tetragon Go Test / build (ubuntu-20.04): LsmFileOpen test is failing, but locally it works fine.

anfedotoff avatar Jul 04 '24 21:07 anfedotoff

Verification of bpf_generic_lsm_v61.o fails. I need to investigate why (help is appreciated).

What is the verification error?

Tetragon Go Test / build (ubuntu-20.04): LsmFileOpen test is failing, but locally it works fine. --- FAIL: TestLSMOpenFile (6.85s)

Had a quick look, but couldn't figure out what the issue is. I can have a closer look next week. Here are some first notes:

Looking at the GH CI logs:

logcapture.go:25: time="2024-07-05T10:53:43Z" level=info msg="Loaded generic LSM program: /home/runner/work/tetragon/tetragon/go/src/github.com/cilium/tetragon/bpf/objs/bpf_generic_lsm_v511.o -> file_open"
logcapture.go:25: time="2024-07-05T10:53:43Z" level=info msg="BPF detected features: override_return: true, buildid: true, kprobe_multi: false, uprobe_multi false, fmodret: true, fmodret_syscall: true, signal: true, large: true, lsm: true"

So everything seems to work OK so far.

Logs also say:

jsonchecker.go:183: jsonTestCheck: opening: /tmp/tetragon.gotest.TestLSMOpenFile.3243682509.json

Downloading the artifacts (https://github.com/cilium/tetragon/actions/runs/9807021872?pr=2566), we can find the file there. Looking at the file, there are no lsm events there:

$ jq '. | del(.time) | del(.node_name) | to_entries[] | .key' < tetragon.gotest.TestLSMOpenFile.3243682509.json
"process_exec"
"process_exec"
"process_exit"

kkourt avatar Jul 05 '24 14:07 kkourt

What is the verification error?

You can find it in ARM tests. I fixed it locally, but arm ci verification step still failing...

Had a quick look, but couldn't figure out what the issue is.

Yes, thank you for helping me). It is a little bit weird because locally, it is in logs. Vmtests are working fine also.

anfedotoff avatar Jul 05 '24 15:07 anfedotoff

@kkourt I found out why verification is failed.

https://github.com/cilium/tetragon/blob/c7799feacf315d18d91064d3256c0c28efccb8c8/bpf/process/bpf_process_event.h#L251-L259.

So, the code with loop(...) fails the verification. If I force use the code part with for, everything works fine... May be the problem somehow related with error missing btf func_info and

https://github.com/cilium/tetragon/blob/c7799feacf315d18d91064d3256c0c28efccb8c8/bpf/process/bpf_process_event.h#L227-L232

anfedotoff avatar Jul 08 '24 14:07 anfedotoff

@olsajiri I added you to the reviewers. It would be great if you could have a look when you get a chance. Thanks!

ugh, sry missed this one, will check

olsajiri avatar Jul 08 '24 23:07 olsajiri

hi, IMO this needs to be split into multiple patches, it's hard to review 6000+ lines diff

from quick look this can be separated to smaller logical changes, like: api changes pkg/sensors changes pkg/bpf changes .github changes bpf changes install changes ... plz split anything that makes sense

Yes, sure

anfedotoff avatar Jul 09 '24 07:07 anfedotoff

@olsajiri, I splitted code to logical commits and fix other comments, please, have a look.

anfedotoff avatar Jul 10 '24 15:07 anfedotoff

What about tests: I tested on Ubuntu 20.04

Linux lsm-test 5.15.0-1067-azure #76~20.04.1-Ubuntu SMP Thu Jun 13 18:00:23 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Tests are passing. For now I don't have a clue why CI is failing. As we found out, there is no LSM event in CI test. I can try to change policy and look if we have LSM events without any filtering. @kkourt do you have any ideas about this problem?

anfedotoff avatar Jul 10 '24 15:07 anfedotoff

What about tests: I tested on Ubuntu 20.04

Linux lsm-test 5.15.0-1067-azure #76~20.04.1-Ubuntu SMP Thu Jun 13 18:00:23 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Tests are passing. For now I don't have a clue why CI is failing. As we found out, there is no LSM event in CI test. I can try to change policy and look if we have LSM events without any filtering. @kkourt do you have any ideas about this problem?

The failure in https://github.com/cilium/tetragon/actions/runs/9894669687/job/27332734013?pr=2566 says:

observer_test_helper.go:447: LoadConfig error: failed prog /home/runner/actions-runner/_work/tetragon/tetragon/go/src/github.com/cilium/tetragon/bpf/objs/bpf_generic_lsm_v61.o kern_version 393562 loadInstance: attaching 'generic_lsm_event' failed: create raw tracepoint: not supported

Which might explain what's wrong? I'd suggest we probe for above functionality and skip the test if functionality is missing.

kkourt avatar Jul 12 '24 08:07 kkourt

What about tests: I tested on Ubuntu 20.04

Linux lsm-test 5.15.0-1067-azure #76~20.04.1-Ubuntu SMP Thu Jun 13 18:00:23 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Tests are passing. For now I don't have a clue why CI is failing. As we found out, there is no LSM event in CI test. I can try to change policy and look if we have LSM events without any filtering. @kkourt do you have any ideas about this problem?

The failure in https://github.com/cilium/tetragon/actions/runs/9894669687/job/27332734013?pr=2566 says:

observer_test_helper.go:447: LoadConfig error: failed prog /home/runner/actions-runner/_work/tetragon/tetragon/go/src/github.com/cilium/tetragon/bpf/objs/bpf_generic_lsm_v61.o kern_version 393562 loadInstance: attaching 'generic_lsm_event' failed: create raw tracepoint: not supported

Which might explain what's wrong? I'd suggest we probe for above functionality and skip the test if functionality is missing.

Yes, you are right. It is another problem on ARM machines. The problem I mentioned earlier, we have solved already. I'll have a look at how to check if raw tracepoints are available, and put this check in bpf.HasLSMPrograms().

anfedotoff avatar Jul 12 '24 08:07 anfedotoff

It seems to me, that test are passing and fixed all comments. @olsajiri @kkourt , please, could you have a look one more time:)?

anfedotoff avatar Jul 12 '24 12:07 anfedotoff

Merging this, thanks! Great work!

kkourt avatar Jul 19 '24 05:07 kkourt