falco
falco copied to clipboard
[WIP] Adopt specific-engine open methods (new `libsinsp` APIs)
What type of PR is this?
/kind cleanup
/kind feature
Any specific area of the project related to this PR?
/area build
/area engine
What this PR does / why we need it:
This PR is based on :point_right: https://github.com/falcosecurity/libs/pull/540. It introduces 4 main novelties:
- a new option in the configuration file
single_buffer_dimension
, this will be the dimension that a single buffer in our drivers will have. (BPF, kmod, modern BPF). - the
FALCO_BPF_PROBE
env variable is now managed by the consumer (i.e Falco) and no more by the libraries. BTW the behavior is always the same so the final user won't notice any difference. - a new command line flag to enable the
modern_bpf
engine, right now you can only try it by usingFALCOSECURITY_LIBS_SOURCE_DIR
andDRIVER_SOURCE_DIR
options. - every engine has now a dedicated wrapper method to open the inspector, see the libs PR to have more information on this https://github.com/falcosecurity/libs/pull/540.
Which issue(s) this PR fixes:
Special notes for your reviewer:
This PR is based on :point_right: https://github.com/falcosecurity/libs/pull/540, the [WIP] will be removed when the libs PR will be merged
Does this PR introduce a user-facing change?:
new(cmdline/configurations): new flag `--modern-bpf` and new configuration option `single_buffer_dimension`
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Andreagit97
Once this PR has been reviewed and has the lgtm label, please assign fededp for approval by writing /assign @fededp
in a comment. For more information see:The Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
OMG this is amazing 💯 . Thanks for all the hard work! This is a significant and much needed cleanup and also extends configurability.
Few suggestions re single_buffer_dimension
:
(1) How to choose correct values? Would this for example for eBPF mean the total_size
value, so it's not just ring_size
but also includes header_size
? How do we know an end user provided value does not break things? Would a mathematical sanitization and correction to nearest best value in the source code be safer (see (3))?
https://github.com/falcosecurity/libs/blob/master/userspace/libscap/engine/bpf/scap_bpf.c#L149
static const int BUF_SIZE_PAGES = 2048;
int page_size = getpagesize();
int ring_size = page_size * BUF_SIZE_PAGES;
int header_size = page_size;
int total_size = ring_size * 2 + header_size;
(2) How about writing down the "mathematical equations" for each driver in the yaml parameter explanation and even show how the default value is calculated? For driver-bpf for example:
Assuming page_size is 4096 bytes we get a total of ~16.78 MB hard-coded buffer size
[(4096 bytes * 2048 * 2) + 4096 bytes]
Almost wonder if the parameter should just be the multiple
of page_size and not in bytes as page_size
is typically fetched dynamically? No need to change the parameter name - single_buffer_dimension
would still work.
(3) For eBPF ring buffer for examples reading that ring size has to be a power of 2. How does this translate to kmod and current perf buffer in driver-bpf? Never seen any hard coded values not power of 2 AFAIK ... see (1) user input sanitization.
https://www.kernel.org/doc/html/latest/bpf/ringbuf.html?highlight=ring+buffer
Key and value sizes are enforced to be zero. max_entries is used to specify the size of ring buffer and has to be a power of 2 value.
(4) Lastly after clarifying (1) - (3), could we give 5 concrete recommendations in the yaml parameter definition? Maybe 1-2 for decrease and 3-4 good steps to increase the value? End users probably wouldn't want to become mathematicians. Maybe also include some notes on what effects of increasing or decreasing buffer sizes are? E.g. too large of a buffer on little server load could slow down the engine. A buffer too small can cause more kernel side event drops. We can iterate on such recommendations as we gather field experience, happy to help, because some of what I just mentioned may be based more on reputation and is not backed up by data and perf studies.
[This feedback applies to other parameters in falco.yaml
as well so it's not specific to this PR, I think end users would appreciate even more guidance on what can happen if you increase or decrease some of the values ...]
Hey @incertum these are all good points:
(1)(2) This was also my doubt, not sure what could be the best choice for the end user, specify the value in bytes or specify the number of pages (in this second case we need to provide the page size of the system in some way, maybe through a simple flag --print-page-size
that only prints this info (?)). The second approach (number of page) would probably be better since it simplifies all the checks and is also less error-prone. Any opinion on this @jasondellaluce @FedeDP @leogr?
(3) yeah in both BPF probes we always need a power of 2, in the kmod I have to check, if I remember well we don't have this constraint here.
(4) Absolutely agree with that, probably at the beginning they would be quite silly suggestions but as we gather more experience we can enrich them as you said!
Hey @Andreagit97 and @incertum
Since the buffer size is not just a driver thing but also directly affects the performance of the userspace program, I would like the parameter to behave consistently regardless of the driver used. This would also simplify the user experience. However, I am not yet clear about the best way to achieve this.
(1)(2) This was also my doubt, not sure what could be the best choice for the end user, specify the value in bytes or specify the number of pages (in this second case we need to provide the page size of the system in some way, maybe through a simple flag
--print-page-size
that only prints this info (?)). The second approach (number of page) would probably be better since it simplifies all the checks and is also less error-prone. Any opinion on this @jasondellaluce @FedeDP @leogr?
Considering what I said above, specifying the value in bytes is probably not the best choice. Does the number of pages as a unit work well with all drivers? If yes, can we explain that to the user simply and straightforwardly? :thinking:
Moreover, are there other alternatives?
(3) yeah in both BPF probes we always need a power of 2, in the kmod I have to check, if I remember well we don't have this constraint here.
If possible, I'd not introduce a special case for a particular type of driver. I'd directly or indirectly force the result of the math to be a power of 2 unless a compelling use case exists for the kmod that does not work with this constraint. IMO these are the kind of questions we need to ask ourselves before making such a decision.
(4) Absolutely agree with that, probably at the beginning they would be quite silly suggestions but as we gather more experience we can enrich them as you said!
:+1:
In general, I'd prefer to keep things as simple as possible. Although that may be opinionated, I'd also validate the value against hard-coded min and max to make the configuration less error-prone (if we don't want to enforce limits, we can just emit warnings). I guess we can calculate the min value in some way. For the max limit, we may choose a reasonably high value. We are already enforcing arbitrary limits in similar situations. For example :point_down: https://github.com/falcosecurity/falco/blob/0828296abc962899e2659f02cb8a623af2a123a1/userspace/falco/configuration.cpp#L271-L274
/milestone 0.33.0
@Andreagit97 @leogr @jasondellaluce @FedeDP
Go with the multiples of page sizes as parameter? Hard-code an array of let's say 10-12 values, communicate the default value as anchor point? That way it's on us to do the math. In addition sanitization check would be very simplified and user experience simplest and safest?
Agreed with it needing to be consistent across drivers @leogr and very simple.
As said here https://github.com/falcosecurity/libs/pull/540#discussion_r954996527 i will continue the discussion here:
Go with the multiples of page sizes as parameter?
Yeah, I think this is the right way.
Hard-code an array of let's say 10-12 values, communicate the default value as anchor point? That way it's on us to do the math.
Not sure about that it, this could be too much restrictive i think that upper and lower bound + check the power of 2 could be enough, WDYT?
In addition sanitization check would be very simplified and user experience simplest and safest?
As said in the previous point we need at least these 2 checks:
- upper and lower bound
- check the power of 2
And I think we need it in Falco as a consumer, but also in scap/sinsp since they are the final users of this size, so at least 2 levels
Agreed with it needing to be consistent across drivers @leogr and very simple.
Yeah also agree on that it would be easier and clearer to manage them all in the same way!
As said in the previous point we need at least these 2 checks:
- upper and lower bound
- check the power of 2
And I think we need it in Falco as a consumer, but also in scap/sinsp since they are the final users of this size, so at least 2 levels
Works and agreed on performing the checks twice.
[nit] Think it would be very important to be transparent about communicating the default values and give guidance re how to best adjust the values (see previous comments).
Close it in favor of https://github.com/falcosecurity/falco/pull/2201