ebpf icon indicating copy to clipboard operation
ebpf copied to clipboard

Enable cleanup of tracefs entries (kprobe_events)

Open ti-mo opened this issue 4 years ago • 6 comments

As raised by @etherandrius: https://cilium.slack.com/archives/CKC55JL8L/p1618310807065500.

Currently, the group prefix for entries in kprobe_events is hardcoded to ebpf_, resulting in identifiers like ebpf_aabbccdd. This might prove inflexible in some environments where programs are frequently SIGKILLed and stale entries are left behind in <tracefs>/kprobe_events.

The main use case would be enabling cleanup: allowing the caller to evict entries from kprobe_events as they see fit, without having to resort to removing all entries starting with ebpf_, possibly disrupting other kprobe tools written using this library.

@lmb suggested using filepath.Base(os.Args[0]) + os.Getpid() for the group name, which is sufficiently identifiable, and the PID could eventually be correlated using logs etc.

A few caveats:

  • Group names allow up to 63 characters, so the name of the binary would need to be truncated for safety.
  • PIDs (integers) have a variable-length string representation, so they would need to be padded to a static length.
  • The library currently makes the assumption that creating multiple identical probes will succeed because the group identifier is random on each attempt. When using the (static) PID instead, repeated kprobe creations will use the same group and name, preventing multiple attachments to the same symbol from the same program. We're no longer tracking Trace Events as separate entities that can have multiple perf events created from them, so this means we'll always need a random component in the group name, and PID won't do.

While there are drawbacks to all approaches (maybe someone should just backport kprobe PMU to 4.x kernels instead... :sweat_smile:), which one should we pick?

  • Kprobe options: link.Kprobe("printk", prog, &LinkOptions{TraceFSPrefix: "foo"})
  • Package-level prefix override: link.TraceFSPrefix(prefix string) (this will mess with tests)
  • No override, but always use the truncated, sanitized name of the current binary instead of ebpf. This will bite in various other ways, e.g. some users naming their binaries app when dropping them in containers.
  • Any other ideas?

Additionally, should a cleanup method be provided? Something like link.DeleteTraceFSPrefix("foo") that removes all probes containing the given prefix? This breaks our abstraction, but we need to provide this if we ever plan on changing the group/prefix format.

ti-mo avatar Apr 13 '21 15:04 ti-mo

Hey there 👋🏻

Not sure how much help this is, but a few things to keep in mind:

  • if the process that loaded the eBPF programs gets killed, it won't be able to properly clean up kprobe_events, and the entries might eventually add up to 65k (kernel limit -> see here). There should probably be a mechanism to clean up existing entries of a predefined pattern to prevent this from happening. Cleaning 65k entries at once takes a while and is definitely best avoided.

  • (If the PID is chosen as the random part of the name) PIDs may be reused, if the random component is deleted, the library should allow entries to be reused (skip the write in kprobe_events, and jump directly to the perf_event_open syscall with the existing kprobe ID).

  • if a custom prefix is provided, it should be cleaned up to avoid invalid characters (a simple [a-zA-Z0-9] regex should be enough).

Let me know if I can do anything else to help !

Gui774ume avatar Apr 13 '21 15:04 Gui774ume

@Gui774ume Thanks for the feedback! Some follow-up questions:

Cleaning 65k entries at once takes a while and is definitely best avoided.

What's the alternative to 65k entries at once? :sweat_smile: The way I see it, a cleanup operation should be executed once at startup, not after being OOMkilled for a few hours.

ti-mo avatar Apr 13 '21 16:04 ti-mo

The way I see it, a cleanup operation should be executed once at startup, not after being OOMkilled for a few hours.

Yup that's it. Before adding anything to kprobe_events we should check if existing entries should be deleted first. The way we currently do it is by making sure that the pid associated to the entry is still active. If not, we delete the entry. Once this clean up is done, our lib moves on to registering new entries 😃

Gui774ume avatar Apr 13 '21 16:04 Gui774ume

This might help: https://github.com/DataDog/ebpf/blob/master/manager/manager.go#L1458

(everything we talked about should also apply to uprobe_events btw)

Gui774ume avatar Apr 13 '21 16:04 Gui774ume

by making sure that the pid associated to the entry is still active

Note, this was recently discovered to be an issue when PID namespaces are in-use. This is because the PID as part of the event name is from the context of whoever created that event, which may be in a different PID namespace from the current process using this library.

brycekahle avatar Dec 08 '21 23:12 brycekahle

Thanks for the update. I actually think that this is a pretty tricky problem to fix, so I'll remove the good first issue label.

lmb avatar Dec 09 '21 09:12 lmb

There is a workaround now, so I think we can close this issue.

lmb avatar Nov 01 '23 15:11 lmb