ebpf Expose kernel feature test API

For https://github.com/cilium/cilium/issues/11014, we're looking for a safe and fast way to make decisions in the Cilium code around (BPF-related) kernel features. Currently, we rely on bpftool for some things, some Cilium-specific code for others, but we'd prefer if this would be abstracted and unified to make things clearer. We've informally discussed this topic before, but I'm looking to open the conversation here so we can gather some more data to make an informed decision about the way forward.

Some implementation/design ideas below:

Errors

@lmb suggested we don't explicitly expose a 'features' API, but that the caller rely on errors being returned from particular library features. For example, a (hypothetical) Map.IterateBatch() would return a NotSupportedError when the kernel doesn't support map batch operations.

Package Variables

Exporting current have... variables (e.g. ebpf.HaveNestedMaps). This would hugely inflate the package's API surface; not ideal.

Features Struct

ebpf.Features() returning a struct containing bools:

type KernelFeatures {
  NestedMaps bool
  MapMutabilityModifiers bool
  etc ...
}

The struct can easily be appended to as new features are added, and will not cause breakage. Tests can be deprecated over time if they no longer prove useful.

Doesn't inflate the package API surface.

This option would allow Cilium to include a quick overview of supported kernel features in debug/status output. I remember someone internally voicing a need for this, though not sure who did.

Another conversation topic is whether or not we want to continue maintaining these feature tests going forward. What is the maintenance burden of the tests currently in the library? Once written, do they need to be extended or modified?

Mar 02 '21 11:03 ti-mo

Thanks for putting this together. In my mind your proposals fall into two categories:

Minimal (aka ErrNotSupported): provides a smooth user experience with minimal API surface. You start out writing your code in a naive way and abort on ErrNotSupported. You figure out a way to work around the issue and use errors.Is(err, ErrNotSupported) to switch to that other code path. However I suspect there are cases where this doesn't work well. Do you know of any in the Cilium code base?

Batteries included (aka package vars, features struct): probably even nicer for the user, since based on these tests we could return ErrNotSupported as well. This is a lot more uncertain for me: which feature tests do we accept, which ones do we reject? Do we have to maintain them forever? I don't want the library to become a dumping ground of feature tests that nobody uses anymore.

I think there is a third option: better tools to write feature tests, like exposing internal.FeatureTest. I'm not exactly sure what the value add would be here though, since the FeatureTest code is so small that copying it is easy. Is there some integration that is only possible if FeatureTest is in the library? E.g. I could imagine a function DetectAllFeatures() (map[string]bool, error) which would include features declared in the library and in user code as well. What are the other problems you are hitting?

TL;DR: It would be super useful to know which problems you are trying to solve :)

Mar 02 '21 17:03 lmb

Hi, I am thinking about making a GSoC proposal for this issue and I wanted to join the conversation here so that I am able to make my proposal as clear as possible. The original project idea is to add more feature probes to bpftool. Instead of doing that my proposal would be about implementing the already existing and newly wanted probes in cilium/ebpf.

A possible use case for cilium would be removing the need to shell out to bpftool for the feature probing like cilium does here. With this use case in mind, @ti-mo's idea of the features struct and a ebpf.Features() function would sort of be a drop in replacement for what cilium currently does with bpftool -j feature probe.

I don't have an answer regarding the topic whether this sort of probing should all go into this library and if main contributors want to take on the burden of maintaining them once they are in the library. But coming from the issues regarding an extension to bpftool I see why it would be useful for cilium in places like I linked above.

Mar 31 '21 12:03 rgo3

Hey @rgo3, thanks for jumping in! Also hoping @qmonnet could provide us some perspective, as he wrote most of what's currently in bpftool.

Maintenance burden

Something I'd like to zoom in on first is maintenance burden. Let's assume everyone agrees that having less code to maintain is better. :smile: Looking at the commit logs of feature.c, it doesn't look too high-churn as most of these tests look like they're write-once, with the occasional bugfix or extending lists of symbols etc.

Value to the ecosystem

As we (Cilium) work on implementing more and more of our userland components in pure Go, and with libbpf nearing its 1.0 release (hopefully stabilizing its scope somewhat), it might start making sense to move feature probing logic to Go and expose them to the ecosystem at large. A single Go implementation that contains a useful set of probes other libraries/tools can import without friction should pay off in the long term in the form of contributions as users find gaps. The Go+eBPF userbase is growing at a steady pace, and to keep this momentum going, removing barriers to adoption (like having to build and ship bpftool in container images) is crucial.

Goals

From my perspective, the following criteria should be satisfied:

it should be possible to consume without adopting other parts ot the library
it should be easy to extend (e.g. the machinery for caching test results should be easy to understand and use)
proposing changes should be smooth (we already have a leg up here as it doesn't live in the kernel tree)
it should be fast like bpftool (sudo bpftool -j feature probe 0.04s user 0.03s system 101% cpu 0.067 total). This will be hard to beat, but shouldn't take orders of magnitude more wall time to complete.

Use cases

@lmb to give a concrete example of a use case for such an API: https://github.com/cilium/cilium/blob/3c590925362bbf863e66b20e501eeaf08c0fda84/pkg/bpf/map.go#L154-L174. Any form of higher-level logic around which kernel features are supported will require the caller to create an abstraction over the check. If we wouldn't have bpftool here, we'd have to write a function that creates an LRU hash map that returns a bool. (or maybe this info is obtainable using other means, don't know) Repeat for n more kernel features that require some kind of initialization steps in userspace depending on what the kernel supports.

Another use case that's been suggested internally is providing a dump of enabled kernel features in bug reports, which afaik we currently don't do. (granted, we could use bpftool for this) This will help our team to investigate issues on our users' environments we don't have access to.

As an extra data point: something else I've noticed bpftool does is parsing the running kernel config as a fast-path for some checks. If the kernel config is available, userspace doesn't need to guess/probe. For example, without CO-RE, and on kernels without BTF available, prebuilt kprobe programs rely on certain kconfig options to be set (mostly due to ifdefs in struct declarations changing the offsets), and having a canonical way to query a running kernel configuration would allow the program to bail out in a much friendlier way (as opposed to a verifier error). This should not be part of the initial scope of this implementation, though. (I wrote a naive config parser and writer a while ago for use in the conntracct build system, but never used it for runtime checks)

Apr 02 '21 10:04 ti-mo

Another use case that's been suggested internally is providing a dump of enabled kernel features in bug reports, which afaik we currently don't do.

We do. Our bugtool report includes bpf_features.h generated directly from bpftool. Our datapath also uses bpf_features.h, so we'd have to find a replacement if we stop using bpftool.

As an extra data point: something else I've noticed bpftool does is parsing the running kernel config as a fast-path for some checks.

In Cilium, we would prefer to probe rather than to rely on the kernel config (cf. https://github.com/cilium/cilium/issues/14314). Unfortunately, we don't currently have an exact equivalent probe for each CONFIG_XXX we use.

Apr 02 '21 11:04 pchaigno

Agreed on what was said, I also believe that the interest for Cilium is to have the probing done in Go if possible. I don't have much to add, if anything I can confirm that maintenance for the probes in bpftool has been very low so far.

As a side note, having probes in the Go lib does not prevent us to add new ones to bpftool too, and I'd like to extend the list in the future (when I have some time TM) or to help someone contributing on that. For the scope of the GSoC, probably best to stay focus on just the Go approach though.

Apr 02 '21 12:04 qmonnet

GSoC projects should be around 175-hours long (e.g., about 4-5 weeks full-time) excluding scoping (now) and the initial familiarization with the projects (now and coming months). I don't think it's realistic to completely replace bpftool by a cilium/ebpf implementation in that time frame, so such a project application should define exactly what will be implemented in the scope of GSoC. I'd strongly recommend to discuss this here or on Slack before the application deadline.

Apr 02 '21 14:04 pchaigno

I don't think it's realistic to completely replace bpftool by a cilium/ebpf implementation in that time frame

Thanks for bringing this up @pchaigno, I am currently a bit "overwhelmed" with how to make a good proposal while there are lots of uncertain parts (at least for a newcomer like me there is a lot going on here :D). I also don't think that it is a realistic goal for me to completely replace every occurence of bpftool in cilium/cilium if I get accepted for GSoC, but I would keep this in the back of my head and mention this as a long-term goal on my proposal so that the intention of the API is clear.

@ti-mo's message(s) above already give me some great points to mention in a proposal. What if I limit the scope of my GSoC proposal to being able to replace the bpftool -j feature probe call like it is used here: https://github.com/cilium/cilium/blob/3c590925362bbf863e66b20e501eeaf08c0fda84/pkg/datapath/linux/probes/probes.go#L170 (IIRC a proposal should include concrete code examples of what the proposal will enable, so this Probe() function snippet could be such an example)? I am aware that this approach kind of excludes all the newly wanted probes mentioned on the original GSoC idea, but I think there would be some other benefits coming from that:

like Timo mentioned once this API exists it is possibly easier to extend/change it than bpftool
I could use more time to investigate how we can change some of the probes that rely on the kernel config (you guys probably already know, I admittedly don't yet)
if my suggested scope is implemented faster than expected/planned I would happily add more wanted probes

Unless there are any "strong no's" regarding this approach for a proposal I would go ahead and start mine the way I just described and sharing the draft with you guys next week, if I understand correctly it's encouraged to get it reviewed before handing in the final version.

Apr 02 '21 16:04 rgo3

Indeed, replacing the whole of bpftool in a few weeks is not realistic. I don't think that was ever on the table. :sweat_smile: The Cilium code linked above has undergone a few minor changes in the past month: https://github.com/cilium/cilium/blob/820d200ff392c2a42ac44d04aad69efbc7408025/pkg/datapath/linux/probes/probes.go.

Note: bpftool -j feature probe executes and dumps all results, so it cannot be partially swapped out, there must be a replacement for all specific values Cilium depends on. For example, for struct MapType, I can only find accesses to HaveLruHashMapType and HaveSockhashMapType in Cilium code. Worst case, an initial implementation that only probes available map types and only limits itself to those two types could be acceptable for use within Cilium.

So, as a high-level scope, what about:

Coming up with an API design that fits the criteria outlined above.
Getting the necessary infrastructure/plumbing in place to efficiently execute+cache tests. I see this as the biggest potential time sink, as it might require refactoring depending on which API design we go for.
Migrating the feature tests the lib uses internally to use the new structure.

Optionally, as a bonus, implement the minimum required feature tests (e.g. those 2 map types) for use in Cilium, and migrate a piece of the Cilium code base to use these tests.

Getting just the work on the lib done in under 5 weeks would be quite impressive, so I'm thinking about if/where we could cut some corners.

@rgo3 What you've described sounds like a very sensible approach! Let us know if you need some more input for putting together the proposal.

Apr 02 '21 17:04 ti-mo

I also don't think that it is a realistic goal for me to completely replace every occurence of bpftool in cilium/cilium if I get accepted for GSoC, but I would keep this in the back of my head and mention this as a long-term goal on my proposal so that the intention of the API is clear.

Makes sense :+1:

What if I limit the scope of my GSoC proposal to being able to replace the bpftool -j feature probe call like it is used here: https://github.com/cilium/cilium/blob/3c590925362bbf863e66b20e501eeaf08c0fda84/pkg/datapath/linux/probes/probes.go#L170

Sounds good to me.

a proposal should include concrete code examples of what the proposal will enable, so this Probe() function snippet could be such an example

That's correct and this would be a good concrete code example IMO.

I am aware that this approach kind of excludes all the newly wanted probes mentioned on the original GSoC idea

I think that's fine. Honestly, I'm starting to think we should even have two separate GSoC project ideas: (1) short-term extension of bpftool with new wanted feature probes and (2) longer-term replacement of bpftool feature probe by cilium/ebpf. That said, I'm not sure we have any other students interested in (1) or (2) at the moment.

if my suggested scope is implemented faster than expected/planned I would happily add more wanted probes

IMO applications with a smaller scope are often stronger applications because the applicant is more likely to succeed.

if I understand correctly it's encouraged to get it reviewed before handing in the final version.

Definitely!

Apr 02 '21 20:04 pchaigno

Sorry for the late reply. I would follow Timo's proposal:

So, as a high-level scope, what about:

* Coming up with an API design that fits the criteria outlined above.

* Getting the necessary infrastructure/plumbing in place to efficiently execute+cache tests. I see this as the biggest potential time sink, as it might require refactoring depending on which API design we go for.

* Migrating the feature tests the lib uses internally to use the new structure.

As he points out, migrating bpftool feature probe is a bit of an all or nothing proposition.

I'd like to propose one addition to the high level goals: "Feature tests can be written in other packages, and migrated into cilium/ebpf with ease". This implies that feature tests become an exported API, with the obvious benefit that there is less pressure to add "exotic" feature tests to the library. The downside is that it might require us to expose more raw syscalls from internal/. Any thoughts?

Finally, some opinions of mine on implementation details:

Feature tests really have three states, not two: supported, not supported, inconclusive.
Feature tests should happen on demand, and their result (if conclusive) should be cached. This is because the typical application will only use a small subset of feature tests over its lifetime. This makes the struct Features design that Timo proposed difficult.
Feature tests never change the process environment, so callers are responsible for raising or lowering capabilities like CAP_NET_ADMIN, CAP_BPF. This adds complexity: a feature test may fail because we don't have the necessary permissions (see also inconclusive).
Feature tests target mainline Linux, not distribution kernels like CentOS. Distributions play games with kernels and their versions. So far this hasn't been a problem, but you never know what distributions are up to. More of a documentation issue really.
Parsing the kernel config should be out of scope, as Paul mentions. Related to the previous point as well.

Apr 07 '21 11:04 lmb

Currently I have been kinda building my proposal on the Features struct idea, but I see how this is difficult if individual probes should happen on demand and we don't necessarily want to probe every feature in one go like bpftool does. I'll try to incorporate the comments from Lorenz into my proposal and will then share the draft. Thanks for all the help :)

Apr 07 '21 19:04 rgo3

I am still sort of struggling with good and clear use-cases to mention in a proposal for GSoC. As stated above I started out with a Feature struct and this would have had a easy to understand use-case (including code example) for cilium: replacing bpftool -j feature probe. Lorenz made some good points that I can mention for the implementation details on the GSoC proposal, but due to the fact that a feature struct is not ideal, it is hard for me to give another good example now as designing API and result caching machinery is a big part of the project I am proposing.

Apr 09 '21 13:04 rgo3

Sorry for the late reply, this week has been very busy for me. Have you seen this piece of code?

https://github.com/cilium/ebpf/blob/3f35578dd203efbd7c1298d12bf82fc236880f72/internal/feature.go#L34-L48

At it's core this abstraction is sound:

it memoizes / caches the result of a single function call
and it returns a "rich" error that includes a feature name and a mainline kernel version

Based on this I would create a "functional" API like so:

func HaveMap(MapType) error
func HaveProgram(ProgramType) error
func ProgramHasHelper(ProgramType, asm.Function) error

which would probably live in a separate package features or something. I'm not sure ProgramHasHelper as I outlined it here is a good idea, but it's something we can consider. It gives you an idea of the complexity involved as well: we might have to have custom code for each helper check that we want to support (actually, if it comes to that we probably shouldn't have that function).

There are problems with the abstraction:

Writing a feature tests is still a bit confusing. First time contributors tend to not understand the difference between a conclusive and inconclusive test. It would be nice to improve this.
It doesn't allow retrieving a list of "all supported features", since there is no (global) list of all features.
It's not clear how to write feature tests that have parameters as outlined above.
It's not clear how EPERM from the kernel should be treated. EPERM implies that the feature is supported, but you don't have access to it in your current context. What does the feature test indicate?

Apr 15 '21 16:04 lmb

Hey @lmb,

I actually used that piece of code and based my GSoC proposal partly on it, suggesting "similar" functions to test features like you are suggesting here:

func ProbeMapType(type MapType, ...) error {
    //Probe availability by creating a dummy map with the given MapType
    // ...
}

Additionally I implied that a "all supported features" function would be based on those basic abstraction to test features.

Some of the other problems you mentioned I left open intentionally to not clutter my proposal with a bunch of stuff we are /I was unsure off. Glad my final proposal version somewhat aligns with the additional remarks you made after I handed it in :laughing:.

Apr 19 '21 18:04 rgo3

Hey @lmb,

I want to discuss your suggestions of having the features API in a separate pkg features. I currently started a first basic implementation of probing for map types in this draft PR: #321. It currently lives in pkg ebpf and uses an unexported function: bpfMapCreate() as well as an unexported type: bpfMapCreateAttr. Would it make sense to export the function and type while putting them in internal and refactoring the code that uses the unexported versions?

The other option I see, as bpfMapCreate() is not that much code and only uses functions from internal, would be duplicating that function (and the attribute struct) to the new pkg I am creating. Although if this is the way we go with I could see how I would have to copy a bunch of the functions and types that already exists in syscalls.go once we extend the features API with ProgTypes etc. in the future.

This is not a blocker for me currently, but @ti-mo and I agreed that it makes sense to start talking about where the API will live early.

Jun 14 '21 13:06 rgo3

I want to discuss your suggestions of having the features API in a separate pkg features. I currently started a first basic implementation of probing for map types in this draft PR: #321. It currently lives in pkg ebpf and uses an unexported function: bpfMapCreate() as well as an unexported type: bpfMapCreateAttr. Would it make sense to export the function and type while putting them in internal and refactoring the code that uses the unexported versions?

Yes, moving unexported types / functions to internal is always an option. Unless there is an import cycle I don't see a downside, do you?

Jun 14 '21 15:06 lmb

Ok. Then I'll try and start moving those to internal. :+1:

Jun 15 '21 09:06 rgo3

I have a couple of open questions/status updates I'd like to share:

Currently I am working on implementing the probes for the last missing MapTypes: SkStorage, StructOpts, RingBuf, InodeStorage, TaskStorage. The last 4 currently don't exist in the lib but are added in #318.
- The probe for RingBuf works and I tested on Linux 5.10
- The probe for StructOps is not working, and I'm currently unable to figure out why. @ti-mo already suggested not diving too deep into this one.
- The *Storage MapTypes need a btf blob when creating them. At least thats what bpftool does. Maybe I can also just use an invalid Fd here as we do with nested maps and check for EBADF? I'll have to try that, but if I have to create a btf blob I don't know how to do that.
There is already some discussion around the error interface that gets returned on my draft PR. While this is just for EPERM we generally need policy decisions around which kernel errors will be mapped to which library error (supported, notSupported, inconclusive) and if the cache can get flushed due to e.g. capability changes of the calling process.
Lastly (looping in @qmonnet here), I came up with the question if the flags field in the BPFMapCreateAttr should be configurable via a function argument. This is definitely related to the fact that in my original proposal we planned using the new features API to replace current feature tests in the library like e.g. haveMmapableMaps(). In order to to this, the flag field definitely would have to be configurable by the user (in this case the lib itself). (As a side note: if the new features shall be used in the ebpf package of the lib we'll have to rethink were the code lives because currently this would create an import cycle. But I think I can address this once the above points are done :heavy_check_mark:)

I just want to hear some opinions/thoughts on the points I mentioned to have a better feel for what I should address next and how.

Jun 22 '21 14:06 rgo3

Hi Robin,

I'll have to try that, but if I have to create a btf blob I don't know how to do that.

Yeah the library currently doesn't have support for that. Unless there is a very compelling use case I wouldn't want to add it either. Does the BTF have to be generated on the fly or could pre-create it and include the raw bytes?

we generally need policy decisions around which kernel errors will be mapped to which library error (supported, notSupported, inconclusive) and if the cache can get flushed.

This boils down to: is it better to have a false positive or a false negative? This needs looking at code in the wild and working through the implications.

Re cache flushing: either we can get away with never invalidating the cache or we provide the user with an explicit function that invalidates the cache. Anything else is madness.

Lastly (looping in @qmonnet here), I came up with the question if the flags field in the BPFMapCreateAttr should be configurable via a function argument.

What do you mean by configurable by the user? You have to be able to pass the flags when performing a feature test?

Jun 30 '21 10:06 lmb

Hi,

Does the BTF have to be generated on the fly or could pre-create it and include the raw bytes?

For now I actually got away with not creating a btf blob at all, using an invalid FD for it and checking for EBADF. Is that an acceptable solution when probing features?

This needs looking at code in the wild and working through the implications.

I'm fairly new to this wilderness, what kind of code should I be looking at? Until now I pretty much just browsed bpftool and there it's just two failure states:

Supported if feature probe goes through
Not supported if there is any kind of error from the probe If you don't have the permissions there won't even be any probing at all.

I like the idea of offering a function to flush the cache (or specific entries?), those could be used by calling programs that change permissions during their lifetime. I'll add that to the draft PR, if we decide we don't like it we can take it out again.

What do you mean by configurable by the user? You have to be able to pass the flags when performing a feature test?

What I meant was extending the current function signature by something like this:

func ProbeMapType(mt ebpf.MapType, flags uint32) error

Without this its going to be impossible to replace feature tests like the one that was added in #329 because the default probe I have in my draft PR will create Array maps without any flags.

Jun 30 '21 13:06 rgo3

Besides the open points (error interface, "configurable" flags") discussed in the messages above, what is needed to turn my draft PR into a "non-draft" PR which we eventually can merge?

Jul 01 '21 16:07 rgo3

For now I actually got away with not creating a btf blob at all, using an invalid FD for it and checking for EBADF. Is that an acceptable solution when probing features?

We have that in other places, so I'd say yes until proven otherwise.

I'm fairly new to this wilderness, what kind of code should I be looking at?

I would start with the code in the cilium codebase which would call these functions. What does it do? How does the outcome of the BPF_F_MMAPABLE / BPF_F_INNER_MAP tests influence the code?

I like the idea of offering a function to flush the cache (or specific entries?), those could be used by calling programs that change permissions during their lifetime. I'll add that to the draft PR, if we decide we don't like it we can take it out again.

Flushing specific entries means being able to identify them somehow, so I would skip that. It's OK to hold off on flushing until someone needs it, imo.

Without this its going to be impossible to replace feature tests like the one that was added in #329 because the default probe I have in my draft PR will create Array maps without any flags.

I wouldn't add a flags parameter: it leads to a combinatorial explosion of possible outcomes. Having flags in general also won't be enough: sometimes passing a specific flag will require other things to be set up / more parameters to be passed. Which means you need another parameter added to the function, etc. I would add another function CanMmapMaps() or whatever you want to call it (aside: does this differ between map types? what we have in master always uses Array)

what is needed to turn my draft PR into a "non-draft" PR which we eventually can merge?

Has your GSoC mentor done a review?

Jul 01 '21 17:07 lmb

After offline discussion:

Cache flusing should be omitted for now, until there is a clearer use case for it. I think we can ignore the cap-drop TOCTOU catch for now, as this security model is becoming much less common with the adoption of containerization and fine-grained capabilities in the kernel.
Kernel support for certain map flags (e.g. freezing) should probably be probed separately (e.g. a HaveMapFlags() API), as maintaining probes for all permutations of flags x map types doesn't feel realistic, and not very future-proof. This should probably be discussed separately, as this approach has an accuracy penalty. This kind of API can answer with certainty that the kernel supports a given flag (e.g. on a simple array), but not if a certain flag works for a particular map type.

As for the convention around error returns from HaveMapType, a suggestion:

(With the idea that unexpected errors during probing should be unwrappable by the caller.)

Supported: nil
Not supported: ErrNotSupported
~Inconclusive: ErrInconclusive?~ [1]
~.. add more sentinels as necessary~
Everything unexpected (e.g. EINVAL where we expect EBADF): %w

[1] 'inconclusive' might be a bituseless without context, so we might as well wrap all errors encountered, so the user can determine their own course of action. Whether a probe was inconclusive or not is irrelevant. We're only interested in features available given the current process' capabilities, not whether the kernel as a whole supports a particular feature.

Jul 08 '21 13:07 ti-mo

This was added in #673, #321 and others. Closing.

Nov 04 '22 12:11 ti-mo