EOPNOTSUPP error at bpf_perf_event_output function
I'm trying to make the c-based eBPF program.
I attach the kprobe on bio_endio function. Then I tried to pass my structure to user space. (I follow the samples/bpf/trace_output example.)
but bpf_perf_event_output returns -EOPNOTSUPP Error.
bpf_perf_event_output(ctx, &result_map, 0, &result, sizeof(result);
result structure consists of (five u64 var + five u32 var + one char[16]).
struct bpf_map_def SEC("maps") result_map = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = 2, };
If you know the solution, please help me.
probable cause is linux code
static __always_inline u64
__bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map,
u64 flags, struct perf_sample_data *sd)
{
.........
if (unlikely(event->oncpu != cpu))
return -EOPNOTSUPP;
return perf_event_output(event, sd, regs);
}
as per code this case is unlikely. i,e, event->oncpu != cpu
can you please post any snippet to reproduce this? also please post kernel version.
Alternatively as per kernel commit a43eec304
User space needs to perf_event_open() it (either for one or all cpus) and store FD into perf_event_array (similar to bpf_perf_event_read() helper) before eBPF program can send data into it.
do you have perf_event_open called in user space ?
which can be a most likely case.
how it is done?
as in code hello_perf_output.py
perf_submit ultimately calls bpf_perf_event_output.
perf buffer is opened and polled
i.e.
b["events"].open_perf_buffer(print_event)
........
b.perf_buffer_poll()
``
Thanks for answer.
can you please post any snippet to reproduce this?
:Can you refer to the trace_ouput_kern.c & trace_output_user.c in linux/samples/bpf at kernel v5.2 ?? (I'm using kernel v5.2.)
My code is almost same as the example. The structure that is transferred to the user space & the traced function is only different. (trace_output example traces sys_write, and in my case, I trace bio_endio)
And this trace_output example has same problem with my program. So maybe you can reproduce same problem with trace_output example. From what I checked, the example has the same problem. bpf_perf_event_output function in trace_output example also returns EOPNOTSUPP. (or sometimes it returns ENOENT, ENOSPC.... 0 is never returned.)
@geonheec I tried trace_output with following output
$ sudo ./samples/bpf/trace_output
recv 343066 events per sec
100018+0 records in
100017+0 records out
51208704 bytes (51 MB, 49 MiB) copied, 0.289763 s, 177 MB/s
small clarification
in sample code return vlaue for bpf_perf_event_output is not checked.
can you attach diff/c code as I am unable to reproduce the problem? just want to check if it is related to specific kernel version. I am on 5.5.13.
trace_output_kern.c
(-) SEC("kprobe/sys_write") (+) SEC("kprobe/bio_endio)
(+) int res; (-) bpf_perf_event_output(ctx, &my_map, 0, &data, sizeof(data)); (+) res = bpf_perf_event_output(ctx, &my_map, 0, &data, sizeof(data)); (+) char msg[] = "res: %d\n"; (+) bpf_trace_printk(msg, sizeof(msg), res);
and trace_output_user.c is not changed.
I ran that program and check the res value with the following method.
sudo su cd /sys/kernel/debug/tracing cat trace_pipe
Maybe you can find "res: -95" on terminal with the above method.
+) I checked again original trace_output example, and it seems works well. But when I changed "sys_write" to "bio_endio", the returned value goes EOPNOTSUPP. Does bpf_perf_event_output not work well on interrupt context...?
So I think the problem is this, when libbpf sets up perf event buffers for bpf perf event maps, it assumes each index in the map is for a specific CPU. i.e. index 0 is for CPU 0, index 1 for CPU 1, etc. This effectively means you can only use bpf_perf_event_output with BPF_F_CURRENT_CPU.