falco icon indicating copy to clipboard operation
falco copied to clipboard

add an option to pick the buffer dimension

Open leogr opened this issue 3 years ago • 21 comments

Motivation

As a natural consequence that sinsp is adding the ability to set the ring buffer dimension (see https://github.com/falcosecurity/libs/pull/584), we should enable the end user to use this option.

From initial discussions ( refs https://github.com/falcosecurity/falco/pull/2164#issuecomment-1220479171, https://github.com/falcosecurity/falco/pull/2164#issuecomment-1220792012), it was clear that using bytes would not have been the best choice for the users. However, as recently discussed (https://github.com/falcosecurity/libs/pull/584#discussion_r970580068, https://github.com/falcosecurity/libs/pull/584#issuecomment-1246407597), sinsp will still use bytes to leave a certain degree of freedom and avoid future API break in case we have to change the criteria in the future.

From a user perspective, specifying the buffer size in absolute value does not make much sense. It would be hard to calculate the correct value, and it may be different in some systems, and the optimal value might change from version to version. So adding a --buffer-dim flag using bytes is not an option on the table.

One possible solution (initially proposed in https://github.com/falcosecurity/falco/pull/2164#issuecomment-1220792012) is to provide some hardcoded (or automatically calculated) presets and let the user pick one of them. I think it's the best solution from a UX perspective.

Moreover, if the default preset matches the current buffer size value (8MB), this new feature will not change the default behavior, so it won't disrupt those blindly upgrading Falco.

Feature

Add a fixed array of presets of buffer dimensions that the user can easily pick using a command line flag.

Ideally, the array should contain about 10 values.

One of those values must match the current default (8MB). It must be used if the user does not specify any preset.

Additional context

We would like to delivery this feature within /milestone 0.33.0 Assuming the code freeze starts on Sept 26th we have a week and half to implement this feature :crossed_fingers:

TODO: We still have to think about how to calculate values for this array. The calculation should consider some constraints (values must be a power of 2, multiple of PAGE_SIZE, etc,)

@Andreagit97 @incertum @FedeDP @gnosek @jasondellaluce cc @falcosecurity/falco-maintainers

leogr avatar Sep 14 '22 12:09 leogr

/milestone 0.33.0

leogr avatar Sep 14 '22 12:09 leogr

Additional context:

  • @Andreagit97 initially implemented the size of the buffer in bytes as param
  • power of 2 constraint and benefits of validation checks had been raised as well as the fact that it may be cumbersome for the end user to guess a correct and good value without guidance or additional context
  • A common approach to calculate the size is to use a multiple of PAGE_SIZE and this is also what libs had implemented prior to the refactor. Consequently it was thought that exposing the number of pages as param may be a better end user experience, however without a fixed array of choices
  • Over time the fixed array of values was then favored
  • Coming back full circle it is more feasible to expose the per cpu buffer size expressed in bytes to the end user as param given values that make sense are pre-computed
  • .. note that regardless the buffer will still be double-mapped ...

Old default per cpu buffer size is:

4096 bytes * 2^11 = 8388608 bytes (assuming 4096 bytes is page size - it is the most typical page size)

Fixed array values (bytes) that may make sense assuming page size is likely going to be 4096 bytes:

524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456

To me either (1) number of memory pages or (2) size in bytes as param would be fine given we plan to expose fixed choices, no math need, can easily bump to next best value if there are too many kernel side drops :)

@leogr small follow up question: Is there more to the fact you mentioned that it has to be a multiple of PAGE_SIZE, asking because page sizes are already powers of 2.

incertum avatar Sep 15 '22 05:09 incertum

@leogr small follow up question: Is there more to the fact you mentioned that it has to be a multiple of PAGE_SIZE, asking because page sizes are already powers of 2.

That's correct, but I specified it for clarity. IIRC, the constraints are (some redundant):

  • min 2*PAGE_SIZE
  • power of 2
  • multiple of PAGE_SIZE

That being said since PAGE_SIZE is a power of 2 already, the only troublesome implication we have is that the minimum allowed value must be greater or equal to 2*PAGE_SIZE, since:

For example, x86 CPUs normally support 4K and 2M (1G if architecturally supported) page sizes, ia64 architecture supports multiple page sizes 4K, 8K, 64K, 256K, 1M, 4M, 16M, 256M and ppc64 supports 4K and 16M. Source: https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html

Thus, if PAGE_SIZE was 256M, most of the pre-calculated values won't work. Though that's not a realistic scenario, since even the hardware supports such page sizes, the kernel will likely use a smaller size.

After a quick look in the kernel source, the max page size value I found is 2^23 (8MB) and I've no idea what happens when Huge Pages are enabled :thinking:

cc @incertum @Andreagit97

leogr avatar Sep 15 '22 08:09 leogr

Huge pages are an optimization, we only need to worry about the smallest page size per architecture :)

gnosek avatar Sep 15 '22 11:09 gnosek

Huge pages are an optimization, we only need to worry about the smallest page size per architecture :)

@gnosek Why only the smallest?

leogr avatar Sep 15 '22 15:09 leogr

Because that's the mmap granularity (if you mmap a multiple of a huge page, you can get one big page instead of a bunch of smaller ones but having huge pages does not prevent you from using smaller ones).

gnosek avatar Sep 15 '22 15:09 gnosek

Because that's the mmap granularity (if you mmap a multiple of a huge page, you can get one big page instead of a bunch of smaller ones but having huge pages does not prevent you from using smaller ones).

@gnosek

Ok, but in some arcs, one can choose several page sizes even when not using huge pages. For example :point_down: https://github.com/torvalds/linux/blob/master/arch/ia64/include/asm/page.h#L30-L40

I'd say we have to worry about CONFIG_IA64_PAGE_SIZE_64KB (the bigger one in the IA64 set).

Am I wrong? :thinking:

leogr avatar Sep 15 '22 15:09 leogr

Technically yes, though I don't think we should worry too much about ia64 in the first place :D

gnosek avatar Sep 15 '22 15:09 gnosek

Technically yes, though I don't think we should worry too much about ia64 in the first place :D

I'd go more with exotic archs. For example, I'd use as a reference the bigger page size I found under the kernel sources: https://github.com/torvalds/linux/blob/master/arch/hexagon/include/asm/page.h#L35-L38 :smile_cat:

If we follow this strategy, our smallest allowed buffer size should be 2^20 * 4 ( > hexagon PAGE_SIZE * 2) = 1Mb

leogr avatar Sep 15 '22 15:09 leogr

Well, on hexagon the minimum buffer size is still two pages, not our fault the pages are huge 🤷

gnosek avatar Sep 15 '22 16:09 gnosek

Nice, appreciate the additional insights @gnosek @leogr - learned new stuff :)

Realistically, most end users probably only need to bump the buffer size up, not down. Therefore, anchored differently now, see below. Still think using 2^12 hence 4096 the most typically page size as step size between pre-calculated options should be ok?

And in a nut shell likely it's gonna be about change 8388608 to 16777216 and you'll be fine, especially now with simple consumer / syscalls of interest pushdown filter. Unlikely you have to go above 33554432 atm - at least for most end users and typical cloud native use case.

2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824

incertum avatar Sep 15 '22 16:09 incertum

This is what I propose, let's see if it sounds good to you :)

The Falco UX will provide an array of bytes dimensions, let's see a realistic example:

Numbers in () are the indexes

[ *(0), 4 MB(1), 8 MB(2), 16 MB(3), 32 MB(4), 64 MB(5), 128 MB(6), 256 MB(7)]

Please note: I would leave the first slot empty (*) to use the index 0 for a sort of dynamic configuration in the next future (maybe we could use it for other reasons, but just to have a fallback if we need something peculiar)

So as you can see the array has 8 elements, the first is empty so the first available index for users will be 1 and it will correspond to a 4MB buffer_bytes_dim in this example.

To be honest this is more than an example I would use exactly this array as a first implementation (or maybe something really similar). You may ask why these dimensions?

Different systems support different page sizes and our validation constraints are:

  • buffer_bytes_dim > 2*PAGE_SIZE
  • buffer_bytes_dim is a power of 2.
  • buffer_bytes_dim is a multiple of the page size (if the page size is less than our buffer_bytes_dim this is always true because the page size is always a power of 2).

So considering a really strange case in which the system page size is 1MB we are still compliant with all our checks, let's say i don't know which performance this system will have with a buffer of 4MB and a page of 1MB, but the user could always choose a greater index like 5 (64MB) ....

So the maximum page size allowed to use all the indexes from 1 to 7 is 1MB . This is because if we consider the next power of 2 as page size (so 2MB), the first check is no more valid (buffer_bytes_dim > 2*PAGE_SIZE): 4MB > 2*2MB this is no more true.

Obviously in this case the user could use the next index (so 2 -> 8 MB) and all the checks are satisfied again :) But yes I think this is just a corner case...

Maybe we can also use a vector like this:

[ *(0), 2 MB(1), 4 MB(2), 8 MB(3), 16 MB(4), 32 MB(5), 64 MB(6), 128 MB(7), 256 MB(8)]

In this case, the last full supported page dimension would be 512KB (Not a bug issue in my opinion since it is already a huge page size)

What do you think about this proposal? @leogr @jasondellaluce @incertum

Andreagit97 avatar Sep 15 '22 16:09 Andreagit97

Realistically, most end users probably only need to bump the buffer size up, not down. Therefore, anchored differently now, see below. Still think using 2^12 hence 4096 the most typically page size as step size between pre-calculated options should be ok?

@incertum i think we are proposing almost the same thing :) I've added just some details on how to do it

Andreagit97 avatar Sep 15 '22 16:09 Andreagit97

@Andreagit97 yeah either way will be fine, I think most end users want to spend like 30 seconds on it and just select the next number if the default doesn't work. All explanations can be captured in the falco.yaml comments above the new param and end users can read them if they choose to do so.

And yup more than 256MB as per cpu size seems huge already ...

Ok don't wanna be a scientist about it either so am gonna stop here -> for me anything in that line works. Again really great work!!!!

incertum avatar Sep 15 '22 16:09 incertum

Thinking again about it we can also use an array like this

[ *(0), 1 MB(1), 2 MB(2), 4 MB(3), 8 MB(4), 16 MB(5), 32 MB(6), 64 MB(7), 128 MB(8), 256 MB(9), 512 MB(10)]

With this one, we cover almost all the possible scenarios, and if for example, you have a page size of 1MB you can start using the indexes from (3) (so 4MB)

Andreagit97 avatar Sep 15 '22 17:09 Andreagit97

The Andrea idea and explanation is awesome.

🤩

I believe there exist edge case where a smaller dim makes sense (I'd call it: drop earlier, but drop less on average). But less of 2mb I don't believe can be useful.

2mb (or 4mb) to 256mb (or 512mb) is a good range imo.

👍

leogr avatar Sep 15 '22 17:09 leogr

[ *(0), 2 MB(1), 4 MB(2), 8 MB(3), 16 MB(4), 32 MB(5), 64 MB(6), 128 MB(7), 256 MB(8)]

I really like both the idea (both simple to explain and to implement!) and agree that this one could be the "final" vector. Great job Andre!

FedeDP avatar Sep 16 '22 07:09 FedeDP

Agreed!

jasondellaluce avatar Sep 16 '22 09:09 jasondellaluce

I'd go for

[ *(0), 1 MB(1), 2 MB(2), 4 MB(3), 8 MB(4), 16 MB(5), 32 MB(6), 64 MB(7), 128 MB(8), 256 MB(9), 512 MB(10)]

as proposed by Andrea here https://github.com/falcosecurity/falco/issues/2208#issuecomment-1248393510

In the end, we are just increasing the degree of freedom. We have just to document it well in the command line help so that users are aware that extreme values are potentially dangerous.

leogr avatar Sep 16 '22 09:09 leogr

https://github.com/falcosecurity/falco/issues/2208#issuecomment-1248393510

Moreover let's think of scenarios in which we want to trace just few syscalls (7/8 syscalls) in this case 1 MB seems a reasonable value

Andreagit97 avatar Sep 16 '22 09:09 Andreagit97

/assign @Andreagit97

Andreagit97 avatar Sep 16 '22 09:09 Andreagit97