libs
libs copied to clipboard
[WIP] new(test): add driver_sanity_test_suites
Signed-off-by: Melissa Kilby [email protected]
What type of PR is this?
Uncomment one (or more)
/kind <>
lines:
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
Any specific area of the project related to this PR?
Uncomment one (or more)
/area <>
lines:
/area API-version
/area build
/area driver-kmod
/area driver-bpf
/area driver-modern-bpf
/area libscap-engine-bpf
/area libscap-engine-gvisor
/area libscap-engine-kmod
/area libscap-engine-modern-bpf
/area libscap-engine-nodriver
/area libscap-engine-noop
/area libscap-engine-source-plugin
/area libscap-engine-udig
/area libscap
/area libpman
/area libsinsp
/area tests
/area proposals
Does this PR require a change in the driver versions?
/version driver-API-version-major
/version driver-API-version-minor
/version driver-API-version-patch
/version driver-SCHEMA-version-major
/version driver-SCHEMA-version-minor
/version driver-SCHEMA-version-patch
What this PR does / why we need it:
Add driver_sanity_test_suites/localhost/kernels_compat_tests
to test
. This is more of a Developer Guide and decoupled from any project test suites.
(1) At the beginning of the year and before that the project was in a more turbulent state where very often my eBPF driver builds were suddenly failing or did not pass the verifier anymore for some kernel versions after merging upstream from time to time, making me wonder about more extensive "grid-search like" sanity checks. In addition, over the past years there have been a large number of issues around "unable to load driver" or "eBPF verifier failed" or which "clang version" works? As of right now, the project is in a fantastic state and pretty much all clang versions for all kernel versions work for driver-bpf
- eBPF verifier happy.
(2) Aligns with the effort of making it easier for developers to join the project. For instance, not seeing any Vagrant VMs in this project which is common for other projects where you can just launch a script and get started with example kernels and use cases.
Sharing a very early draft. While it is titled driver sanity, oh well it wasn't very sane to put something auto-discovery like together without any hard coding of kernel versions and such.
Few questions:
- Would such a self-serve driver kernel compatibility test guide help the community?
- If yes, how much polishing would be desired as it's more of a composition of ad-hoc little scripts not intended to turn into a broader project?
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
NONE
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: incertum
Once this PR has been reviewed and has the lgtm label, please assign andreagit97 for approval by writing /assign @andreagit97
in a comment. For more information see:The Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
Just WOW 😲 This is something we have been needing for a while! Having something similar in the CI is super valuable indeed! Perhaps a cmake target to run the tests would help a bit :) I will give it a look asap, at least for some initial feedback! Thank you very much for this contribution!
This is real good Melissa! We've been looking for something like this for a while now. Thank you a lot!
Just WOW 😲 This is something we have been needing for a while! Having something similar in the CI is super valuable indeed!
localhost
obviously is more tractable to gain more confidence in your code changes and spot issues you care about in your own deployment :) which typically is only a few distros and a smaller grid of kernel releases ...
Proper CI needs to care about more which is such a daunting task, that's probably why we are where we are. https://github.com/falcosecurity/libs/issues/531
Perhaps a cmake target to run the tests would help a bit :) I will give it a look asap, at least for some initial feedback! Thank you very much for this contribution!
Let me look into this more as this gets a bit more polished. Will also look more into https://github.com/falcosecurity/libs/pull/506 to converge to a good common style for such tests.
Sorry, I'm just trying the CI :) /retest
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: incertum
Once this PR has been reviewed and has the lgtm label, please assign andreagit97 for approval by writing /assign @andreagit97
in a comment. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
@FedeDP: Added cmake targets as I finally got to refactoring and polishing it a bit. Thanks again for the suggestion, it helped making the driver-sanity tests cleaner and easier to use.
@Molter73: Tried to make the cmake setup similar to e2e
targets. Thanks to your work on e2e it was pretty straight forward as I used the e2e cmake setup as template.
@LucaGuerra: Thanks for your feedback in the CI integration discussion. Did not forget about making sure to reboot the VM after each kmod test to be resilient against buggy kernel modules.
Removed WIP as v1 would now be considered complete. The PR is now ready for review. Aware that it's a random collection of scripts, mostly bash scripts. Happy to address any style feedback throughout in an attempt to make the scripts easier to follow and maintain, I think there is improvement potential in this regard.
Outlook: Emulation support still missing and arm64 support as well ... let's first see if v1 is useful for x86_64 for now.
Few observations: Noticed that in general ubuntu
distro appears to be the most difficult one. Re-ran these tests many times and a few ubuntu kmod tests sometimes randomly just did not work. centos7
stable in comparison.
Unrelated to this PR also noticed that the eBPF verifier on ubuntu let's minor bugs pass compared to other distros (same kernel versions), hence yum based distros appear to be the better choice for dev for sure. Also kmod never seemed to work for newer ubuntu kernels (e.g. 5.19 or 6.0), while centos7 was fine.
/milestone 0.11.0
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Suggestions for first pass review:
- Test if it even works on
localhost
for you, please read README carefully - Scripts are glue-code type of scripting, mostly bash, nothing very polished (motto: 80-20 wins the day)
- Posting my test run results here https://github.com/falcosecurity/libs/issues/982
- ubuntu VM loop is more flaky now than when I last worked on it, but centos7 VM loop is more stable -> please also read the README for additional comments and suggestions
- Using old fashioned ssh remote commands and scp as vagrant rsync and other tricks would sometimes not work.
- modern_bpf would be added in a follow up PR
Thanks @Molter73 for the very detailed first sweep of comments :heart:, will address all of them. Re @leogr https://github.com/falcosecurity/libs/pull/524#discussion_r1141376936 ... convert to Go would first like to discuss this more before pushing next changes ...
Pros:
- Some sort of Go module would fit better into existing testing framework of the broader project
- Would get rid of some bash helper scripts and/or the loops in a subset of bash scripts
- Likely cleaner code than pure bash scripts w/ limitations listed below in mind
Cons:
- Turning this into a Go module could be regarded as glorified bash script as we rely a lot on shell core-utils, so would shelling out a lot from Go be ok? We would have to ...
WDYT @leogr and @Molter73 I would be fine either way, keep bash or try to replacing as much as possible w/ Go. Just not a big fan of Python :upside_down_face: with some exceptions, please read more below:
Python:
We have just one little Python script for the "Data Science" part of this. Python has the strongest packages for these type of quick data transformations on small data plus all the fancy plotting libraries. Not sure if we can get rid of this and use Go instead ... not aware of many Data Science wrangling and graphing use cases or good libraries in this regard in Go.
WDYT @leogr and @Molter73 I would be fine either way, keep bash or try to replacing as much as possible w/ Go.
I'm fine with either honestly, but if most of the Go code is just gonna be creating commands in strings to then be run by a separate shell, I would just keep the bash scripts, just with a little more linting and formatting to prevent them from becoming a hellish nightmare (as bash tends to do).
As for the little python in these changes, I know people like to talk against python nowadays (yes its performance is usually garbage, yes it has some concurrency issues, yes it is not a strongly typed language), but I honestly like it for some things, it's great for data science and our e2e tests without the fixture feature from pytest would be a nightmare (and I honestly haven't seen this approach on any other language/test framework). So, to me, it's fine to keep the one python script in the PR.
Rebased and shared current test results here https://github.com/falcosecurity/libs/issues/982#issuecomment-1581918966
tried it on Ubuntu 22.04 Linux 5.19.0-45-generic #46~22.04.1-Ubuntu x86_64
and it worked without issues :tada:! It took me almost 1 hour but I've not a powerful setup (8 CPUs) or a powerful bandwidth (~10MBit/s on lucky days)
tried it on Ubuntu 22.04
Linux 5.19.0-45-generic #46~22.04.1-Ubuntu x86_64
and it worked without issues 🎉! It took me almost 1 hour but I've not a powerful setup (8 CPUs) or a powerful bandwidth (~10MBit/s on lucky days)
Thanks Andrea for reporting back, yes this sounds about right when you do everything from scratch (building containers, extracting kernel headers etc). Re-launching tests should go faster, and at the end it's a perfect test you do on the side to check before pushing major kernel driver changes up or when you are too lazy for manual tests 🙃
Max is still checking if it works on ArchLinux. Meanwhile anyone having more change requests wrt to the bash scripts or other aspects? Planning to build it out (add modern_bpf) and improve over time, hopefully we are agreeing on what a good first version is soon. Thanks!
Hi all,
I just finished the testing on Arch Linux and I confirm it worked:
Thanks a ton for assisting in testing this @maxgio92 🙏 !
@falcosecurity/libs-maintainers what needs to happen to get this PR merged?
This PR addresses the localhost testbed for developers as outlined in https://github.com/falcosecurity/libs/blob/master/proposals/20230530-driver-kernel-testing-framework.md#test-infrastructure and has proven useful on multiple occasions.
Furthermore, I would propose to defer major improvements or refactors to future PRs to also open up the possibility for others to help improve it. What do you think?
LGTM label has been added.
Thank you @Molter73 for taking another look, appreciate it :pray: !
Also re-worked the previous commit of adding more info to the readme here https://github.com/falcosecurity/libs/pull/524/commits/3d96343644b635a97125f13d1a378074fee5cead more clearly stating how this is different and separate from the approach we will take for the new kernel driver CI-powered tests. I thought it was important to highlight as there will be no overlap in setup, scripts and design choices, because of these very different use cases. CC @LucaGuerra @therealbobo @FedeDP
@Molter73 kindly asking if you have additional change requests for v1 or if we can proceed? Thanks!
Hi everyone, kind bump, thank you!
LGTM label has been added.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: incertum, jasondellaluce, Molter73
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [Molter73,incertum,jasondellaluce]
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
/milestone 5.1.0+driver