ebpf
                                
                                 ebpf copied to clipboard
                                
                                    ebpf copied to clipboard
                            
                            
                            
                        Assemble a dummy BTF blob for probing StructOps maps
With https://github.com/cilium/ebpf/pull/321, an API for probing available map types in the kernel was added. However, a StructOps map requires a valid BTF blob to be specified in order to make creation work.
@qmonnet was able to get this (partially?) working in bpftool: https://github.com/cilium/ebpf/pull/321#discussion_r662944737.
This issue is for implementing the equivalent using a pre-baked (or assembled at runtime) BTF blob to be able to probe this map type successfully.
cc @rgo3
Works completely for bpftool, but I haven't submitted a patch upstream yet.
Full working patch:
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index ecaae2927ab8..629d39c98f10 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -203,6 +203,22 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
 	__u32 btf_key_type_id = 0, btf_value_type_id = 0;
 	struct bpf_create_map_attr attr = {};
 	int fd = -1, btf_fd = -1, fd_inner;
+	int btf_vmlinux_value_type_id = 0;
+	struct btf *btf_vmlinux;
+
+	/* [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED */
+	__u8 const btf_data[] = {
+		0x9f, 0xeb, 0x01, 0x00, 0x18, 0x00, 0x00, 0x00,
+		0x00, 0x00, 0x00, 0x00, 0x30, 0x00, 0x00, 0x00,
+		0x30, 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00,
+		0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01,
+		0x04, 0x00, 0x00, 0x00, 0x20, 0x00, 0x00, 0x01,
+		0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x0d,
+		0x00, 0x00, 0x00, 0x00, 0x07, 0x00, 0x00, 0x00,
+		0x01, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00,
+		0x00, 0x00, 0x00, 0x0c, 0x02, 0x00, 0x00, 0x00,
+		0x00, 0x69, 0x6e, 0x74, 0x00, 0x78, 0x00, 0x61,
+		0x00 };
 
 	key_size	= sizeof(__u32);
 	value_size	= sizeof(__u32);
@@ -245,6 +261,17 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
 		value_size = 0;
 		max_entries = 4096;
 		break;
+	case BPF_MAP_TYPE_STRUCT_OPS:
+		btf_fd = bpf_load_btf(btf_data, sizeof(btf_data), NULL, 0, false);
+		if (btf_fd < 0)
+			return false;
+		value_size = 256;
+		btf_vmlinux = libbpf_find_kernel_btf();
+		if (libbpf_get_error(btf_vmlinux))
+			return false;
+		btf_vmlinux_value_type_id = btf__find_by_name_kind(btf_vmlinux,
+		     "bpf_struct_ops_tcp_congestion_ops", BTF_KIND_STRUCT);
+		break;
 	case BPF_MAP_TYPE_UNSPEC:
 	case BPF_MAP_TYPE_HASH:
 	case BPF_MAP_TYPE_ARRAY:
@@ -264,7 +291,6 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
 	case BPF_MAP_TYPE_XSKMAP:
 	case BPF_MAP_TYPE_SOCKHASH:
 	case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
-	case BPF_MAP_TYPE_STRUCT_OPS:
 	default:
 		break;
 	}
@@ -292,6 +318,7 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
 		attr.max_entries = max_entries;
 		attr.map_flags = map_flags;
 		attr.map_ifindex = ifindex;
+		attr.btf_vmlinux_value_type_id = btf_vmlinux_value_type_id;
 		if (btf_fd >= 0) {
 			attr.btf_fd = btf_fd;
 			attr.btf_key_type_id = btf_key_type_id;
(I got the BTF blob from examining (strace) the load of a BTF object created from int foo = 0; or something like that; Maybe I can find a nicer way to present it in the code before submitting. But that's a detail.)
@qmonnet Thank you for the example! Looks like this does indeed rely on being able to obtain the vmlinux BTF, which requires sysfs to be mounted and accessible at /sys. Be aware (also for bpftool) that this is not guaranteed when running containerized apps, though Docker seems to mount sysfs by default. Things might be different on other container runtimes and schedulers.
Just echoing here the discussion(s) we had in this PR:
- https://github.com/cilium/ebpf/pull/321#discussion_r662944737
- https://github.com/cilium/ebpf/pull/321#discussion_r662954128
For now, it seems unlikely we'll be able to create a (mock) StructOps map, so best to conclude that StructOps maps are not supported from the perspective of the current process if the process can't obtain a copy of the vmlinux BTF blob.
(continuation of https://github.com/cilium/ebpf/pull/321#discussion_r670461168)
So yes, the vmlinux BTF is loaded into the kernel somehow?
Yes, (all?) kernel BTF seems to be preloaded as far as I can see:
~ strace bpftool btf dump id 1
...
bpf(BPF_BTF_GET_FD_BY_ID, {btf_id=1}, 120)
...
[1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
... etc.
The next fd's seem to be BTFs for various other subsystems.
This makes me doubt if sysfs is really needed to obtain the vmlinux BTF blob, or maybe BPF_BTF_GET_FD_BY_ID is a more recent addition.
It still requires a way to find the id for kernel BTF, though :thinking:.
vmlinux seems to be fixed at 1, but we should make sure there's a const that can be depended on.
Then once we have this fd we could assign it to attr->btf_fd, this won't be valid and the map won't be created,
I don't see why that would be invalid. :sweat_smile: If we can obtain the vmlinux BTF reliably using a syscall and parse its graph for bpf_struct_ops_tcp_congestion_ops, we have our probe.
I don't see why that would be invalid. :sweat_smile:
Because the kernel explicitly checks that this BTF object is not kernel BTF in the case of struct_ops maps, see map_create() in kernel/bpf/syscall.c:
	if ( [...] || attr->btf_vmlinux_value_type_id) {
		struct btf *btf;
		btf = btf_get_by_fd(attr->btf_fd);
		[...]
		if (btf_is_kernel(btf)) {
			btf_put(btf);
			err = -EACCES;
			goto free_map;
		}
Agreed on the other points.
A random thing to keep in mind:
bpf(BPF_BTF_GET_FD_BY_ID, {btf_id=1}, 120)
That syscall requires CAP_SYS_ADMIN, so it won't work for feature probes I'd say.
But then most of the probes will require some level of privilege anyway?
This will become possible to implement after https://github.com/cilium/ebpf/pull/641 has been merged.
Closing this as we no longer really need this for probing (see https://github.com/cilium/ebpf/pull/746) and https://github.com/cilium/ebpf/pull/641 will be pushed over the line at some point.
Reopened as we'll still need to gain the ability to craft a valid StructOps program at some point for probing helper type availability.
There's currently no strong driver for probing helpers in tracing, struct_ops, ext and lsm programs. Implementing these are not so trivial, as programs need to be loaded with certain attach targets, etc.
To revisit later if there is a need.