libs [WIP] new(test): add driver_sanity_test

[WIP] new(test): add driver_sanity_test_suites

Open incertum opened this issue 1 year ago • 5 comments

Signed-off-by: Melissa Kilby [email protected]

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-udig

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Add driver_sanity_test_suites/localhost/kernels_compat_tests to test. This is more of a Developer Guide and decoupled from any project test suites.

(1) At the beginning of the year and before that the project was in a more turbulent state where very often my eBPF driver builds were suddenly failing or did not pass the verifier anymore for some kernel versions after merging upstream from time to time, making me wonder about more extensive "grid-search like" sanity checks. In addition, over the past years there have been a large number of issues around "unable to load driver" or "eBPF verifier failed" or which "clang version" works? As of right now, the project is in a fantastic state and pretty much all clang versions for all kernel versions work for driver-bpf - eBPF verifier happy.

(2) Aligns with the effort of making it easier for developers to join the project. For instance, not seeing any Vagrant VMs in this project which is common for other projects where you can just launch a script and get started with example kernels and use cases.

Sharing a very early draft. While it is titled driver sanity, oh well it wasn't very sane to put something auto-discovery like together without any hard coding of kernel versions and such.

Few questions:

Would such a self-serve driver kernel compatibility test guide help the community?
If yes, how much polishing would be desired as it's more of a composition of ad-hoc little scripts not intended to turn into a broader project?

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Aug 02 '22 05:08 incertum

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: incertum Once this PR has been reviewed and has the lgtm label, please assign andreagit97 for approval by writing /assign @andreagit97 in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Aug 02 '22 05:08 poiana

Just WOW 😲 This is something we have been needing for a while! Having something similar in the CI is super valuable indeed! Perhaps a cmake target to run the tests would help a bit :) I will give it a look asap, at least for some initial feedback! Thank you very much for this contribution!

Aug 02 '22 05:08 FedeDP

This is real good Melissa! We've been looking for something like this for a while now. Thank you a lot!

Aug 02 '22 08:08 jasondellaluce

Just WOW 😲 This is something we have been needing for a while! Having something similar in the CI is super valuable indeed!

localhost obviously is more tractable to gain more confidence in your code changes and spot issues you care about in your own deployment :) which typically is only a few distros and a smaller grid of kernel releases ...

Proper CI needs to care about more which is such a daunting task, that's probably why we are where we are. https://github.com/falcosecurity/libs/issues/531

Perhaps a cmake target to run the tests would help a bit :) I will give it a look asap, at least for some initial feedback! Thank you very much for this contribution!

Let me look into this more as this gets a bit more polished. Will also look more into https://github.com/falcosecurity/libs/pull/506 to converge to a good common style for such tests.

Aug 03 '22 05:08 incertum

Sorry, I'm just trying the CI :) /retest

Aug 09 '22 08:08 Andreagit97

[APPROVALNOTIFIER] This PR is NOT APPROVED

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Nov 05 '22 01:11 poiana

@FedeDP: Added cmake targets as I finally got to refactoring and polishing it a bit. Thanks again for the suggestion, it helped making the driver-sanity tests cleaner and easier to use. @Molter73: Tried to make the cmake setup similar to e2e targets. Thanks to your work on e2e it was pretty straight forward as I used the e2e cmake setup as template. @LucaGuerra: Thanks for your feedback in the CI integration discussion. Did not forget about making sure to reboot the VM after each kmod test to be resilient against buggy kernel modules.

Removed WIP as v1 would now be considered complete. The PR is now ready for review. Aware that it's a random collection of scripts, mostly bash scripts. Happy to address any style feedback throughout in an attempt to make the scripts easier to follow and maintain, I think there is improvement potential in this regard.

Outlook: Emulation support still missing and arm64 support as well ... let's first see if v1 is useful for x86_64 for now.

Few observations: Noticed that in general ubuntu distro appears to be the most difficult one. Re-ran these tests many times and a few ubuntu kmod tests sometimes randomly just did not work. centos7 stable in comparison. Unrelated to this PR also noticed that the eBPF verifier on ubuntu let's minor bugs pass compared to other distros (same kernel versions), hence yum based distros appear to be the better choice for dev for sure. Also kmod never seemed to work for newer ubuntu kernels (e.g. 5.19 or 6.0), while centos7 was fine.

Nov 05 '22 03:11 incertum

/milestone 0.11.0

Dec 02 '22 13:12 FedeDP

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Mar 02 '23 15:03 poiana

/remove-lifecycle stale

Mar 02 '23 16:03 incertum

Suggestions for first pass review:

Test if it even works on localhost for you, please read README carefully
Scripts are glue-code type of scripting, mostly bash, nothing very polished (motto: 80-20 wins the day)
Posting my test run results here https://github.com/falcosecurity/libs/issues/982
ubuntu VM loop is more flaky now than when I last worked on it, but centos7 VM loop is more stable -> please also read the README for additional comments and suggestions
Using old fashioned ssh remote commands and scp as vagrant rsync and other tricks would sometimes not work.
modern_bpf would be added in a follow up PR

Mar 15 '23 06:03 incertum

Thanks @Molter73 for the very detailed first sweep of comments :heart:, will address all of them. Re @leogr https://github.com/falcosecurity/libs/pull/524#discussion_r1141376936 ... convert to Go would first like to discuss this more before pushing next changes ...

Pros:

Some sort of Go module would fit better into existing testing framework of the broader project
Would get rid of some bash helper scripts and/or the loops in a subset of bash scripts
Likely cleaner code than pure bash scripts w/ limitations listed below in mind

Cons:

Turning this into a Go module could be regarded as glorified bash script as we rely a lot on shell core-utils, so would shelling out a lot from Go be ok? We would have to ...

WDYT @leogr and @Molter73 I would be fine either way, keep bash or try to replacing as much as possible w/ Go. Just not a big fan of Python :upside_down_face: with some exceptions, please read more below:

Python:

We have just one little Python script for the "Data Science" part of this. Python has the strongest packages for these type of quick data transformations on small data plus all the fancy plotting libraries. Not sure if we can get rid of this and use Go instead ... not aware of many Data Science wrangling and graphing use cases or good libraries in this regard in Go.

Mar 20 '23 06:03 incertum

WDYT @leogr and @Molter73 I would be fine either way, keep bash or try to replacing as much as possible w/ Go.

I'm fine with either honestly, but if most of the Go code is just gonna be creating commands in strings to then be run by a separate shell, I would just keep the bash scripts, just with a little more linting and formatting to prevent them from becoming a hellish nightmare (as bash tends to do).

As for the little python in these changes, I know people like to talk against python nowadays (yes its performance is usually garbage, yes it has some concurrency issues, yes it is not a strongly typed language), but I honestly like it for some things, it's great for data science and our e2e tests without the fixture feature from pytest would be a nightmare (and I honestly haven't seen this approach on any other language/test framework). So, to me, it's fine to keep the one python script in the PR.

Mar 20 '23 09:03 Molter73

Rebased and shared current test results here https://github.com/falcosecurity/libs/issues/982#issuecomment-1581918966

Jun 08 '23 05:06 incertum

tried it on Ubuntu 22.04 Linux 5.19.0-45-generic #46~22.04.1-Ubuntu x86_64 and it worked without issues :tada:! It took me almost 1 hour but I've not a powerful setup (8 CPUs) or a powerful bandwidth (~10MBit/s on lucky days)

Jun 22 '23 19:06 Andreagit97

tried it on Ubuntu 22.04 Linux 5.19.0-45-generic #46~22.04.1-Ubuntu x86_64 and it worked without issues 🎉! It took me almost 1 hour but I've not a powerful setup (8 CPUs) or a powerful bandwidth (~10MBit/s on lucky days)

Thanks Andrea for reporting back, yes this sounds about right when you do everything from scratch (building containers, extracting kernel headers etc). Re-launching tests should go faster, and at the end it's a perfect test you do on the side to check before pushing major kernel driver changes up or when you are too lazy for manual tests 🙃

Max is still checking if it works on ArchLinux. Meanwhile anyone having more change requests wrt to the bash scripts or other aspects? Planning to build it out (add modern_bpf) and improve over time, hopefully we are agreeing on what a good first version is soon. Thanks!

Jun 22 '23 19:06 incertum

Hi all, I just finished the testing on Arch Linux and I confirm it worked:

Jun 28 '23 10:06 maxgio92

Thanks a ton for assisting in testing this @maxgio92 🙏 !

Jun 28 '23 15:06 incertum

@falcosecurity/libs-maintainers what needs to happen to get this PR merged?

This PR addresses the localhost testbed for developers as outlined in https://github.com/falcosecurity/libs/blob/master/proposals/20230530-driver-kernel-testing-framework.md#test-infrastructure and has proven useful on multiple occasions.

Furthermore, I would propose to defer major improvements or refactors to future PRs to also open up the possibility for others to help improve it. What do you think?

Jun 29 '23 15:06 incertum

LGTM label has been added.

Git tree hash: e769b66a9136d603c536cdd4838609ae7769456e

Jun 30 '23 10:06 poiana

Thank you @Molter73 for taking another look, appreciate it :pray: !

Also re-worked the previous commit of adding more info to the readme here https://github.com/falcosecurity/libs/pull/524/commits/3d96343644b635a97125f13d1a378074fee5cead more clearly stating how this is different and separate from the approach we will take for the new kernel driver CI-powered tests. I thought it was important to highlight as there will be no overlap in setup, scripts and design choices, because of these very different use cases. CC @LucaGuerra @therealbobo @FedeDP

Jul 03 '23 18:07 incertum

@Molter73 kindly asking if you have additional change requests for v1 or if we can proceed? Thanks!

Jul 07 '23 04:07 incertum

Hi everyone, kind bump, thank you!

Jul 11 '23 15:07 incertum

LGTM label has been added.

Git tree hash: 45f77f49c9b008e201597b72bfc6bc55a2bcc778

Jul 11 '23 15:07 poiana

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: incertum, jasondellaluce, Molter73

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Molter73,incertum,jasondellaluce]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Jul 13 '23 07:07 poiana

/milestone 5.1.0+driver

Jul 28 '23 12:07 FedeDP

libs libs copied to clipboard

[WIP] new(test): add driver_sanity_test_suites

libs
libs copied to clipboard