tracee icon indicating copy to clipboard operation
tracee copied to clipboard

Provide Fluent Forward output option

Open patrick-stephens opened this issue 3 years ago • 2 comments

Initial Checklist

  • [ ] There is an issue describing the need for this PR.
  • [x] Git log contains summary of the change.
  • [x] Git log contains motivation and context of the change.
  • [ ] If part of an EPIC, PR git log contains EPIC number.
  • [ ] If part of an EPIC, PR was added to EPIC description.

Description (git log)

This change is an initial proof-of-concept to provide integration with the Fluent ecosystem to allow Tracee users to take advantage of all the filtering, processing and output options available to those CNCF Graduated projects. Initially this is a super basic implementation just to see if there is interest.

The change introduces new flags to tracee-rules to provide another output option over a TCP connection in the Fluent Forward protocol. This can then be consumed via Fluent Bit or Fluentd to support the myriad of filters and outputs available from those projects. The benefits are then that Tracee does not have to provide output options for anything already provided by Fluent Bit (or Fluentd):

  • https://docs.fluentbit.io/manual/pipeline/outputs
  • https://docs.fluentd.org/output

It will also integrate with existing deployments that already provide Fluent Bit (or Fluentd) very easily, this includes most cloud providers for example, and is just another input to the existing pipelines there.

Not only do we get the benefit of output support but also all the aggregation and filtering options available to Fluent Bit or Fluentd. Additionally anyone implementing a Forward receiver can also handle this data.

This change makes use of an existing library to handle the Fluent Forward protocol support: https://github.com/IBM/fluent-forward-go

Type of change

  • [ ] Bug fix (non-breaking change fixing an issue, preferable).
  • [ ] Quick fix (minor non-breaking change requiring no issue, use with care)
  • [ ] Code refactor (code improvement and/or code removal)
  • [X] New feature (non-breaking change adding functionality).
  • [ ] Breaking change (cause existing functionality not to work as expected).

How Has This Been Tested?

Run up Fluent Bit to accept input and just output to stdout:

$ docker run --rm -it --network=host fluent/fluent-bit -i forward -o stdout -m '*'
Fluent Bit v1.9.8
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/09/09 12:30:50] [ info] [fluent bit] version=1.9.8, commit=97a5e9dcf3, pid=1
[2022/09/09 12:30:50] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/09/09 12:30:50] [ info] [cmetrics] version=0.3.6
[2022/09/09 12:30:50] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2022/09/09 12:30:50] [ info] [sp] stream processor started
[2022/09/09 12:30:50] [ info] [output:stdout:stdout.0] worker #0 started

Build and run the container from this PR (using host networking just to simplify port connection):

$ BTFHUB=0  make -f builder/Makefile.tracee-container build-tracee
...
$ docker run --network=host \
-v /etc/os-release:/etc/os-release-host:ro \
-e LIBBPFGO_OSRELEASE_FILE=/etc/os-release-host \
--pid=host --cgroupns=host --privileged --rm -it tracee:latest \
--forward-url 127.0.0.1:24224 --forward-template /tracee/templates/rawjson.tmpl
INFO: probing tracee-ebpf capabilities...
INFO: starting tracee-ebpf...
INFO: starting tracee-rules...
KConfig: warning: could not check enabled kconfig features
(could not read /boot/config-5.15.0-46-generic: stat /boot/config-5.15.0-46-generic: no such file or directory)
KConfig: warning: assuming kconfig values, might have unexpected behavior
Loaded 15 signature(s): [TRC-1 TRC-13 TRC-2 TRC-14 TRC-3 TRC-11 TRC-9 TRC-4 TRC-5 TRC-12 TRC-6 TRC-10 TRC-7 TRC-16 TRC-15]
Serving metrics endpoint at :4466

Now trigger the standard strace ls detection and we get the usual output in tracee:

*** Detection ***
Time: 2022-09-09T12:31:04Z
Signature ID: TRC-2
Signature: Anti-Debugging
Data: map[]
Command: strace
Hostname: calyptia-laptop

In Fluent Bit we also see the message:

[0] tracee: [1662726664.000000000, {"event"=>"{"Data":null,"Context":{"timestamp":1662726664111071987,"threadStartTime":14508524781214,"processorId":15,"processId":523928,"cgroupId":1,"threadId":523928,"parentProcessId":523926,"hostProcessId":523928,"hostThreadId":523928,"hostParentProcessId":523926,"userId":1000,"mountNamespace":4026531841,"pidNamespace":4026531836,"processName":"strace","hostName":"calyptia-laptop","containerId":"","containerImage":"","containerName":"","podName":"","podNamespace":"","podUID":"","eventId":"101","eventName":"ptrace","argsNum":4,"returnValue":0,"stackAddresses":null,"contextFlags":{"containerStarted":false},"args":[{"name":"request","type":"string","value":"PTRACE_TRACEME"},{"name":"pid","type":"pid_t","value":0},{"name":"addr","type":"void*","value":"0x0"},{"name":"data","type":"void*","value":"0x0"}]},"SigMetadata":{"ID":"TRC-2","Version":"0.1.0","Name":"Anti-Debugging","Description":"Process uses anti-debugging technique to block debugger","Tags":["linux","container"],"Properties":{"MITRE ATT\u0026CK":"Defense Evasion: Execution Guardrails","Severity":3}}}

Final Checklist:

Pick "Bug Fix" or "Feature", delete the other and mark appropriate checks.

  • [ ] I have made corresponding changes to the documentation.
  • [ ] My code follows the style guidelines (C and Go) of this project.
  • [ ] I have performed a self-review of my own code.
  • [ ] I have commented all functions/methods created explaining what they do.
  • [ ] I have commented my code, particularly in hard-to-understand areas.
  • [ ] My changes generate no new warnings.
  • [ ] I have added tests that prove my fix, or feature, is effective.
  • [ ] New and existing unit tests pass locally with my changes.
  • [ ] Any dependent changes have been merged and published before.

Git Log Checklist:

My commits logs have:

  • [ ] Subject starts with "subsystem|file: description".
  • [ ] Do not end the subject line with a period.
  • [ ] Limit the subject line to 50 characters.
  • [ ] Separate subject from body with a blank line.
  • [ ] Use the imperative mood in the subject line.
  • [ ] Wrap the body at 72 characters.
  • [ ] Use the body to explain what and why instead of how.

patrick-stephens avatar Sep 09 '22 12:09 patrick-stephens

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 09 '22 12:09 CLAassistant

Not entirely certain what is happening in the unit tests as the failure seems to be unrelated: https://github.com/aquasecurity/tracee/actions/runs/3061048591/jobs/4940725130#step:4:1011

Using the container build environment locally with my fork shows no issues running the unit tests. make -f builder/Makefile.tracee-make ubuntu-make ARG="test-unit"

patrick-stephens avatar Sep 15 '22 15:09 patrick-stephens

Re-based so if you get a chance @josedonizetti be great to have some feedback.

The main query I have is around testing as this is using an existing library that has testing of the actual protocol, do we want some kind of integration level test or a dedicated unit test?

Unit tests all pass locally for me.

patrick-stephens avatar Oct 12 '22 14:10 patrick-stephens

Apologies, looks like a bad merge snuck in and ruined things. Will resolve and update.

patrick-stephens avatar Oct 13 '22 09:10 patrick-stephens

Resolved build and format issues plus ensured the submodule matched current main version here. Full clean build and unit test locally passes with the Ubuntu target:

$ gh pr checkout 2155
$ docker system prune --force --volumes --all
$ make -f builder/Makefile.tracee-make ubuntu-prepare
$ make -f builder/Makefile.tracee-make ubuntu-make ARG="all"
$ make -f builder/Makefile.tracee-make ubuntu-make ARG="test-unit" 2>&1 | tee ubuntu-unit-tests.txt

ubuntu-unit-tests.txt

The Alpine target seems to be failing to build with an issue in github.com/aquasecurity/libbpfgo and then has a knock on impact on some of the tests failing to build but the output tests all pass:

$ gh pr checkout 2155
$ docker system prune --force --volumes --all
$ make -f builder/Makefile.tracee-make alpine-prepare
$ make -f builder/Makefile.tracee-make alpine-make ARG="test-unit" 2>&1 | tee alpine-unit-tests.txt
...
go: downloading github.com/xeipuuv/gojsonpointer v0.0.0-20190905194746-02993c407bfb
# github.com/aquasecurity/libbpfgo
/usr/bin/ld: /tracee/dist/libbpf/libbpf.a(bpf.o): in function `ensure_good_fd':
bpf.c:(.text+0x3ab0): undefined reference to `fcntl64'
clang-12: error: linker command failed with exit code 1 (use -v to see invocation)
=== RUN   TestParseQueryResString
    --- PASS: TestStdioOverSocket/H (0.00s)
PASS
coverage: 76.7% of statements
ok  	github.com/aquasecurity/tracee/signatures/golang	0.022s	coverage: 76.7% of statements
make: *** [Makefile:612: test-unit] Error 2
make: *** [builder/Makefile.tracee-make:180: alpine-make] Error 2

alpine-unit-tests.txt @josedonizetti I cannot see any known issue for this although it seems to be a glibc issue. I tried updating the versions of github.com/aquasecurity/libbpfgo to both the latest v1.0.0 and v1.0.1 versions (with a related submodule update for 1.0.1 as well) but it still fails for me.

Integration tests indicated above pass:

  1. Run Fluent Bit to receive:
$ docker run --rm -it --network=host fluent/fluent-bit -i forward -o stdout -m '*'
Unable to find image 'fluent/fluent-bit:latest' locally
latest: Pulling from fluent/fluent-bit
1cd0595314a5: Pull complete 
bf75762436b0: Pull complete 
a1f1879bb7de: Pull complete 
4fb246619847: Pull complete 
9a0faea42789: Pull complete 
0a0d99c1d8c3: Pull complete 
Digest: sha256:3045036b2ef35eae09a5f40273a0f1fbd70ca4d67e80918bfd0676b16ba43a29
Status: Downloaded newer image for fluent/fluent-bit:latest
Fluent Bit v1.9.9
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/10/13 09:54:06] [ info] [fluent bit] version=1.9.9, commit=5c03b2e555, pid=1
[2022/10/13 09:54:06] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/10/13 09:54:06] [ info] [cmetrics] version=0.3.7
[2022/10/13 09:54:06] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2022/10/13 09:54:06] [ info] [output:stdout:stdout.0] worker #0 started
[2022/10/13 09:54:06] [ info] [sp] stream processor started
  1. Build and run Tracee to send to Fluent Bit:
$ BTFHUB=0  make -f builder/Makefile.tracee-container build-tracee
...
$ docker run --network=host \
-v /etc/os-release:/etc/os-release-host:ro \
-e LIBBPFGO_OSRELEASE_FILE=/etc/os-release-host \
--pid=host --cgroupns=host --privileged --rm -it tracee:latest \
--forward-url 127.0.0.1:24224 --forward-template /tracee/templates/rawjson.tmpl
  1. Now logs in step 1 show the output from Tracee:
$ docker run --rm -it --network=host fluent/fluent-bit -i forward -o stdout -m '*'
Unable to find image 'fluent/fluent-bit:latest' locally
latest: Pulling from fluent/fluent-bit
1cd0595314a5: Pull complete 
bf75762436b0: Pull complete 
a1f1879bb7de: Pull complete 
4fb246619847: Pull complete 
9a0faea42789: Pull complete 
0a0d99c1d8c3: Pull complete 
Digest: sha256:3045036b2ef35eae09a5f40273a0f1fbd70ca4d67e80918bfd0676b16ba43a29
Status: Downloaded newer image for fluent/fluent-bit:latest
Fluent Bit v1.9.9
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/10/13 09:54:06] [ info] [fluent bit] version=1.9.9, commit=5c03b2e555, pid=1
[2022/10/13 09:54:06] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/10/13 09:54:06] [ info] [cmetrics] version=0.3.7
[2022/10/13 09:54:06] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2022/10/13 09:54:06] [ info] [output:stdout:stdout.0] worker #0 started
[2022/10/13 09:54:06] [ info] [sp] stream processor started
[0] tracee: [1665655136.000000000, {"event"=>"{"Data":null,"Context":{"timestamp":1665655136471983971,"threadStartTime":127280860655,"processorId":10,"processId":40104,"cgroupId":15673,"threadId":40104,"parentProcessId":39507,"hostProcessId":40104,"hostThreadId":40104,"hostParentProcessId":39507,"userId":1000,"mountNamespace":4026533119,"pidNamespace":4026531836,"processName":"slack","hostName":"calyptia-laptop","containerId":"","containerImage":"","containerName":"","podName":"","podNamespace":"","podUID":"","eventId":"714","eventName":"mem_prot_alert","argsNum":1,"returnValue":0,"stackAddresses":null,"contextFlags":{"containerStarted":false},"args":[{"name":"alert","type":"string","value":"Protection changed from W+E to E!"}]},"SigMetadata":{"ID":"TRC-4","Version":"0.1.0","Name":"Dynamic Code Loading","Description":"Writing to executable allocated memory region","Tags":["linux","container"],"Properties":{"MITRE ATT\u0026CK":"Defense Evasion: Obfuscated Files or Information","Severity":2}}}
"}]

patrick-stephens avatar Oct 13 '22 10:10 patrick-stephens

Looks like a unit test failure due to IPv4 vs IPv6 networking:

2022-10-13T19:59:34.3024281Z     output_test.go:413: 
2022-10-13T19:59:34.3025151Z         	Error Trace:	/home/runner/work/tracee/tracee/cmd/tracee-rules/output_test.go:413
2022-10-13T19:59:34.3025814Z         	Error:      	Error message not equal:
2022-10-13T19:59:34.3027230Z         	            	expected: "error connecting to Forward URL \"localhost:12345\": dial tcp 127.0.0.1:12345: connect: connection refused"
2022-10-13T19:59:34.3028587Z         	            	actual  : "error connecting to Forward URL \"localhost:12345\": dial tcp [::1]:12345: connect: connection refused"
2022-10-13T19:59:34.3029668Z         	Test:       	Test_sendToFluentForward/sad_path,_error_reaching_server
2022-10-13T19:59:34.3030767Z         	Messages:   	sad path, error reaching server

Will adjust the check slightly to cope.

patrick-stephens avatar Oct 14 '22 09:10 patrick-stephens

@patrick-stephens I'm really sorry for the delay to look into this. Prioritizing to review between today and tomorrow. I've been short on time with some deadlines for kubecon next week.

josedonizetti avatar Oct 18 '22 10:10 josedonizetti

@patrick-stephens I'm really sorry for the delay to look into this. Prioritizing to review between today and tomorrow. I've been short on time with some deadlines for kubecon next week.

No problem at all @josedonizetti and appreciate all the efforts made.

patrick-stephens avatar Oct 18 '22 10:10 patrick-stephens

@josedonizetti any chance you have time to look?

patrick-stephens avatar Nov 21 '22 16:11 patrick-stephens

EDIT: and meant to say thanks for taking a look!

So we could either move on with this PR, or wait to have it on the new binary tracee, which joins tracee-ebpf and tracee-rules, and makes everything an event. WDYT?

I'm happy with either, sounds like we will need it a new PR for the new binary so I can work on that but updating this is easy if you think it's worth it.

Is the new binary in this same repo so I can look at the new PR?

can we have some documentation about the new option?

Yeah sure thing, I can add some background and an example. Where's the best place for that, docs PR or on this one?

patrick-stephens avatar Nov 23 '22 07:11 patrick-stephens

@patrick-stephens Now that we have completed the change I mentioned above, I would love to review this PR and merge once we move it to the right place. As tracee-rules will be dropped soonish, the new tracee experience is for all events, not only the signatures, and for it we have a new binary tracee. You can see the details of the change here -> https://github.com/aquasecurity/tracee/discussions/2499

Sorry for the delay and the confusion, but end of year, vacation, we are dropping tracee-rules, etc. Let me know if you want help migrating the flag to the new binary?

josedonizetti avatar Jan 09 '23 13:01 josedonizetti

Yeah that sounds like a good idea so I'll aim to update as soon as I get some time.

patrick-stephens avatar Jan 10 '23 19:01 patrick-stephens

@josedonizetti hopefully I've updated this now appropriately, please have a look.

I've not added any tests yet but the current master set of unit tests is failing for me anyway without my changes.

patrick-stephens avatar Jan 27 '23 12:01 patrick-stephens

@patrick-stephens Thank you so much for working on this, as soon as I'm back from cloud native security conf, I'll review it, ok?

josedonizetti avatar Jan 29 '23 20:01 josedonizetti

@patrick-stephens Thank you so much for working on this, as soon as I'm back from cloud native security conf, I'll review it, ok?

Absolutely mate, I've no idea what's up with the tests but they seem to fail for me on main as well so I don't think it is my PR affecting things.

patrick-stephens avatar Jan 30 '23 10:01 patrick-stephens

@patrick-stephens Thank you so much for working on this, as soon as I'm back from cloud native security conf, I'll review it, ok?

Absolutely mate, I've no idea what's up with the tests but they seem to fail for me on main as well so I don't think it is my PR affecting things.

Seem that this PR updates libbpf module, which cause tests to fail

yanivagman avatar Jan 30 '23 16:01 yanivagman

Seem that this PR updates libbpf module, which cause tests to fail

Ah right, thanks. I'll revert that, it must have happened during the build process.

patrick-stephens avatar Jan 30 '23 16:01 patrick-stephens

@patrick-stephens Thank you so much for working on this, as soon as I'm back from cloud native security conf, I'll review it, ok?

@josedonizetti any chance you can have a look?

patrick-stephens avatar Feb 13 '23 10:02 patrick-stephens

@patrick-stephens sure, I was waiting for the bugfix release last week, but the release got a bit delayed, reviewing now.

josedonizetti avatar Feb 13 '23 12:02 josedonizetti

@josedonizetti hopefully this looks good now - be good to get merged if we can but let me know if you need more from me.

patrick-stephens avatar Feb 21 '23 09:02 patrick-stephens

@patrick-stephens Reviewing now.

josedonizetti avatar Feb 23 '23 10:02 josedonizetti

@patrick-stephens Thank you so much for working on this! 🎉

If you have time, would you able to provide a couple more things related to this PR?

  • a note about the feature for our release notes next week (eg: https://github.com/aquasecurity/tracee/discussions/2661)

  • and maybe a simple tutorial?

josedonizetti avatar Feb 23 '23 11:02 josedonizetti

Absolutely, probably over the weekend though

patrick-stephens avatar Feb 23 '23 12:02 patrick-stephens

Looks like it is all good with the dev snapshot:

docker run   --name tracee --rm -it   --pid=host --cgroupns=host --privileged   -v /etc/os-release:/etc/os-release-host:ro   -e LIBBPFGO_OSRELEASE_FILE=/etc/os-release-host   -e TRACEE_EBPF_ONLY=1 --network=host  aquasec/tracee:dev --output forward:tcp://user:[email protected]:24224?tag=tracee

Received by Fluent-Bit:

$ docker run --rm -it --network=host fluent/fluent-bit -i forward -o stdout -m '*'
...
[3] tracee: [1677352749.000000000, {"event"=>"{"timestamp":1677352749393453727,"threadStartTime":1677352749392830141,"processorId":3,"processId":66095,"cgroupId":7928,"threadId":66095,"parentProcessId":65389,"hostProcessId":66095,"hostThreadId":66095,"hostParentProcessId":65389,"userId":0,"mountNamespace":4026531841,"pidNamespace":4026531836,"processName":"rm","hostName":"calyptia-laptop","containerId":"","containerImage":"","containerName":"","podName":"","podNamespace":"","podUID":"","podSandbox":false,"eventId":"731","eventName":"security_inode_unlink","matchedScopes":1,"argsNum":4,"returnValue":0,"syscall":"unlinkat","stackAddresses":null,"contextFlags":{"containerStarted":false,"isCompat":false},"args":[{"name":"pathname","type":"const char*","value":"/zfs-list.cache@rpool"},{"name":"inode","type":"unsigned long","value":3632},{"name":"dev","type":"dev_t","value":25},{"name":"ctime","type":"u64","value":1677352749386521890}]}"}]

...

patrick-stephens avatar Feb 25 '23 19:02 patrick-stephens

@yanivagman @josedonizetti @patrick-stephens also missing from documentation (regardless of tutorial)

itaysk avatar Feb 28 '23 15:02 itaysk

Ah my bad, looks like it needs to go here?

  • https://github.com/aquasecurity/tracee/tree/main/docs/docs/integrating
  • https://github.com/aquasecurity/tracee/blob/main/docs/docs/tracing/output-formats.md

It looks like they still use the old tracee-ebpf binary though.

patrick-stephens avatar Feb 28 '23 15:02 patrick-stephens

Ah my bad, looks like it needs to go here?

  • https://github.com/aquasecurity/tracee/tree/main/docs/docs/integrating
  • https://github.com/aquasecurity/tracee/blob/main/docs/docs/tracing/output-formats.md

Yes, this looks like the right place.

It looks like they still use the old tracee-ebpf binary though.

@josedonizetti is working on updating the documentation with the new binary

yanivagman avatar Feb 28 '23 16:02 yanivagman