falco icon indicating copy to clipboard operation
falco copied to clipboard

[WIP] Adopt specific-engine open methods (new `libsinsp` APIs)

Open Andreagit97 opened this issue 1 year ago • 1 comments

What type of PR is this?

/kind cleanup

/kind feature

Any specific area of the project related to this PR?

/area build

/area engine

What this PR does / why we need it:

This PR is based on :point_right: https://github.com/falcosecurity/libs/pull/540. It introduces 4 main novelties:

  1. a new option in the configuration file single_buffer_dimension, this will be the dimension that a single buffer in our drivers will have. (BPF, kmod, modern BPF).
  2. the FALCO_BPF_PROBE env variable is now managed by the consumer (i.e Falco) and no more by the libraries. BTW the behavior is always the same so the final user won't notice any difference.
  3. a new command line flag to enable the modern_bpf engine, right now you can only try it by using FALCOSECURITY_LIBS_SOURCE_DIR and DRIVER_SOURCE_DIR options.
  4. every engine has now a dedicated wrapper method to open the inspector, see the libs PR to have more information on this https://github.com/falcosecurity/libs/pull/540.

Which issue(s) this PR fixes:

Special notes for your reviewer:

This PR is based on :point_right: https://github.com/falcosecurity/libs/pull/540, the [WIP] will be removed when the libs PR will be merged

Does this PR introduce a user-facing change?:

new(cmdline/configurations): new flag `--modern-bpf` and new configuration option `single_buffer_dimension`

Andreagit97 avatar Aug 12 '22 10:08 Andreagit97

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Andreagit97 Once this PR has been reviewed and has the lgtm label, please assign fededp for approval by writing /assign @fededp in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

poiana avatar Aug 12 '22 10:08 poiana

OMG this is amazing 💯 . Thanks for all the hard work! This is a significant and much needed cleanup and also extends configurability.

Few suggestions re single_buffer_dimension:

(1) How to choose correct values? Would this for example for eBPF mean the total_size value, so it's not just ring_size but also includes header_size? How do we know an end user provided value does not break things? Would a mathematical sanitization and correction to nearest best value in the source code be safer (see (3))?

https://github.com/falcosecurity/libs/blob/master/userspace/libscap/engine/bpf/scap_bpf.c#L149

static const int BUF_SIZE_PAGES = 2048;

int page_size = getpagesize();
int ring_size = page_size * BUF_SIZE_PAGES;
int header_size = page_size;
int total_size = ring_size * 2 + header_size;

(2) How about writing down the "mathematical equations" for each driver in the yaml parameter explanation and even show how the default value is calculated? For driver-bpf for example:

Assuming page_size is 4096 bytes we get a total of ~16.78 MB hard-coded buffer size
 [(4096 bytes * 2048 * 2) + 4096 bytes]

Almost wonder if the parameter should just be the multiple of page_size and not in bytes as page_size is typically fetched dynamically? No need to change the parameter name - single_buffer_dimension would still work.

(3) For eBPF ring buffer for examples reading that ring size has to be a power of 2. How does this translate to kmod and current perf buffer in driver-bpf? Never seen any hard coded values not power of 2 AFAIK ... see (1) user input sanitization.

https://www.kernel.org/doc/html/latest/bpf/ringbuf.html?highlight=ring+buffer Key and value sizes are enforced to be zero. max_entries is used to specify the size of ring buffer and has to be a power of 2 value.

(4) Lastly after clarifying (1) - (3), could we give 5 concrete recommendations in the yaml parameter definition? Maybe 1-2 for decrease and 3-4 good steps to increase the value? End users probably wouldn't want to become mathematicians. Maybe also include some notes on what effects of increasing or decreasing buffer sizes are? E.g. too large of a buffer on little server load could slow down the engine. A buffer too small can cause more kernel side event drops. We can iterate on such recommendations as we gather field experience, happy to help, because some of what I just mentioned may be based more on reputation and is not backed up by data and perf studies.

[This feedback applies to other parameters in falco.yaml as well so it's not specific to this PR, I think end users would appreciate even more guidance on what can happen if you increase or decrease some of the values ...]

incertum avatar Aug 19 '22 06:08 incertum

Hey @incertum these are all good points:

(1)(2) This was also my doubt, not sure what could be the best choice for the end user, specify the value in bytes or specify the number of pages (in this second case we need to provide the page size of the system in some way, maybe through a simple flag --print-page-size that only prints this info (?)). The second approach (number of page) would probably be better since it simplifies all the checks and is also less error-prone. Any opinion on this @jasondellaluce @FedeDP @leogr?

(3) yeah in both BPF probes we always need a power of 2, in the kmod I have to check, if I remember well we don't have this constraint here.

(4) Absolutely agree with that, probably at the beginning they would be quite silly suggestions but as we gather more experience we can enrich them as you said!

Andreagit97 avatar Aug 19 '22 07:08 Andreagit97

Hey @Andreagit97 and @incertum

Since the buffer size is not just a driver thing but also directly affects the performance of the userspace program, I would like the parameter to behave consistently regardless of the driver used. This would also simplify the user experience. However, I am not yet clear about the best way to achieve this.

(1)(2) This was also my doubt, not sure what could be the best choice for the end user, specify the value in bytes or specify the number of pages (in this second case we need to provide the page size of the system in some way, maybe through a simple flag --print-page-size that only prints this info (?)). The second approach (number of page) would probably be better since it simplifies all the checks and is also less error-prone. Any opinion on this @jasondellaluce @FedeDP @leogr?

Considering what I said above, specifying the value in bytes is probably not the best choice. Does the number of pages as a unit work well with all drivers? If yes, can we explain that to the user simply and straightforwardly? :thinking:

Moreover, are there other alternatives?

(3) yeah in both BPF probes we always need a power of 2, in the kmod I have to check, if I remember well we don't have this constraint here.

If possible, I'd not introduce a special case for a particular type of driver. I'd directly or indirectly force the result of the math to be a power of 2 unless a compelling use case exists for the kmod that does not work with this constraint. IMO these are the kind of questions we need to ask ourselves before making such a decision.

(4) Absolutely agree with that, probably at the beginning they would be quite silly suggestions but as we gather more experience we can enrich them as you said!

:+1:

In general, I'd prefer to keep things as simple as possible. Although that may be opinionated, I'd also validate the value against hard-coded min and max to make the configuration less error-prone (if we don't want to enforce limits, we can just emit warnings). I guess we can calculate the min value in some way. For the max limit, we may choose a reasonably high value. We are already enforcing arbitrary limits in similar situations. For example :point_down: https://github.com/falcosecurity/falco/blob/0828296abc962899e2659f02cb8a623af2a123a1/userspace/falco/configuration.cpp#L271-L274

leogr avatar Aug 19 '22 09:08 leogr

/milestone 0.33.0

leogr avatar Aug 19 '22 09:08 leogr

@Andreagit97 @leogr @jasondellaluce @FedeDP

Go with the multiples of page sizes as parameter? Hard-code an array of let's say 10-12 values, communicate the default value as anchor point? That way it's on us to do the math. In addition sanitization check would be very simplified and user experience simplest and safest?

Agreed with it needing to be consistent across drivers @leogr and very simple.

incertum avatar Aug 19 '22 15:08 incertum

As said here https://github.com/falcosecurity/libs/pull/540#discussion_r954996527 i will continue the discussion here:

Go with the multiples of page sizes as parameter?

Yeah, I think this is the right way.

Hard-code an array of let's say 10-12 values, communicate the default value as anchor point? That way it's on us to do the math.

Not sure about that it, this could be too much restrictive i think that upper and lower bound + check the power of 2 could be enough, WDYT?

In addition sanitization check would be very simplified and user experience simplest and safest?

As said in the previous point we need at least these 2 checks:

  • upper and lower bound
  • check the power of 2

And I think we need it in Falco as a consumer, but also in scap/sinsp since they are the final users of this size, so at least 2 levels

Agreed with it needing to be consistent across drivers @leogr and very simple.

Yeah also agree on that it would be easier and clearer to manage them all in the same way!

Andreagit97 avatar Aug 25 '22 15:08 Andreagit97

As said in the previous point we need at least these 2 checks:

  • upper and lower bound
  • check the power of 2

And I think we need it in Falco as a consumer, but also in scap/sinsp since they are the final users of this size, so at least 2 levels

Works and agreed on performing the checks twice.

[nit] Think it would be very important to be transparent about communicating the default values and give guidance re how to best adjust the values (see previous comments).

incertum avatar Aug 25 '22 17:08 incertum

Close it in favor of https://github.com/falcosecurity/falco/pull/2201

Andreagit97 avatar Sep 12 '22 15:09 Andreagit97