coroot-node-agent icon indicating copy to clipboard operation
coroot-node-agent copied to clipboard

Support for 4.x kernels has been dropped?

Open FutureMatt opened this issue 1 year ago • 3 comments

I can't see anything obvious in the changelogs but it looks like at some point after 1.18.9 support for Linux 4.x Kernels was dropped. We currently run some clusters that have a combination of 4.19.0-19 and 5.10.0-29 kernels but the clusters with 4.x kernels are now failing do deploy the node agent with the following log output.

I0705 09:18:07.156568   85825 net.go:20] whitelisted public IPs: [0.0.0.0/0]
I0705 09:18:07.156905   85825 net.go:32] ephemeral-port-range: 32768-60999
I0705 09:18:07.164387   85825 cilium.go:30] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct4_global: no such file or directory
I0705 09:18:07.164448   85825 cilium.go:36] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct6_global: no such file or directory
I0705 09:18:07.164460   85825 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v2: no such file or directory
I0705 09:18:07.164472   85825 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v3: no such file or directory
I0705 09:18:07.164483   85825 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v2: no such file or directory
I0705 09:18:07.164491   85825 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v3: no such file or directory
I0705 09:18:07.167570   85825 main.go:111] agent version: 1.20.3
I0705 09:18:07.167635   85825 main.go:117] hostname: xxxxxxxxx-worker-1
I0705 09:18:07.167644   85825 main.go:118] kernel version: 4.19.0-18-amd64
I0705 09:18:07.169872   85825 main.go:75] machine-id:  xxxxxxxxxxxxxxxxx
I0705 09:18:07.169971   85825 tracing.go:37] OpenTelemetry traces collector endpoint: http://coroot:8080/v1/traces
I0705 09:18:07.170090   85825 otel.go:29] OpenTelemetry logs collector endpoint: http://coroot:8080/v1/logs
I0705 09:18:07.170401   85825 metadata.go:67] cloud provider:
I0705 09:18:07.170419   85825 collector.go:157] instance metadata: <nil>
I0705 09:18:07.170670   85825 profiling.go:52] profiles endpoint: http://coroot:8080/v1/profiles
E0705 09:18:07.198354   85825 profiling.go:100] load bpf objects: field DisassociateCtty: program disassociate_ctty: apply CO-RE relocations: load kernel spec: no BTF found for kernel version 4.19.0-18-amd64: not supported
E0705 09:18:07.198354   85825 profiling.go:100] load bpf objects: field DisassociateCtty: program disassociate_ctty: apply CO-RE relocations: load kernel spec: no BTF found for kernel version 4.19.0-18-amd64: not supported
E0705 09:18:07.198354   85825 profiling.go:100] load bpf objects: field DisassociateCtty: program disassociate_ctty: apply CO-RE relocations: load kernel spec: no BTF found for kernel version 4.19.0-18-amd64: not supported
E0705 09:18:07.198354   85825 profiling.go:100] load bpf objects: field DisassociateCtty: program disassociate_ctty: apply CO-RE relocations: load kernel spec: no BTF found for kernel version 4.19.0-18-amd64: not supported
I0705 09:18:10.202542   85825 containerd.go:38] using /run/containerd/containerd.sock
W0705 09:18:10.202604   85825 registry.go:85] stat /proc/1/root/var/run/crio/crio.sock: no such file or directory
W0705 09:18:10.202604   85825 registry.go:85] stat /proc/1/root/var/run/crio/crio.sock: no such file or directory
E0705 09:18:10.234982   85825 tracer.go:191] load program: argument list too long:
E0705 09:18:10.234982   85825 tracer.go:191] load program: argument list too long:
E0705 09:18:10.234982   85825 tracer.go:191] load program: argument list too long:
E0705 09:18:10.234982   85825 tracer.go:191] load program: argument list too long:
F0705 09:18:10.235037   85825 main.go:149] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
F0705 09:18:10.235037   85825 main.go:149] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
F0705 09:18:10.235037   85825 main.go:149] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
F0705 09:18:10.235037   85825 main.go:149] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
F0705 09:18:10.235037   85825 main.go:149] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long

FutureMatt avatar Jul 05 '24 09:07 FutureMatt

It wasn't intentional. We added an eBPF program with more instructions than the others. Kernel 4.19 has a lower limit for the number of instructions in eBPF programs

def avatar Jul 05 '24 10:07 def

Are there plans to try and support 4.x kernels again or should the minimum requirements listed in the readme be updated?

It uses eBPF to track container related events such as TCP connects, so the minimum supported Linux kernel version is 4.16.

FutureMatt avatar Jul 17 '24 09:07 FutureMatt

It seems to be caused by this code, which will unfold two very long instructions.

SEC("tracepoint/syscalls/sys_enter_sendmmsg")
int sys_enter_sendmmsg(struct trace_event_raw_sys_enter_rw__stub* ctx) {
    __u64 offset = 0;
    #pragma unroll
    for (int i = 0; i <= 1; i++) {
        if (i >= ctx->size) {
            break;
        }
        struct mmsghdr h = {};
        if (bpf_probe_read(&h , sizeof(h), (void *)(ctx->buf + offset))) {
            return 0;
        }
        offset += sizeof(h);
        trace_enter_write(ctx, ctx->fd, 0, (char*)h.msg_hdr.msg_iov, 0, h.msg_hdr.msg_iovlen);
    }
    return 0;
}

guolifu avatar Aug 01 '24 14:08 guolifu

It wasn't intentional. We added an eBPF program with more instructions than the others. Kernel 4.19 has a lower limit for the number of instructions in eBPF programs

Coroot site (https://docs.coroot.com/installation/requirements) has yet correct "minimum supported Linux kernel version is 4.16" as installation requirement, is this problem only apply to 4.19 Kernel?

nashtsai avatar Feb 18 '25 08:02 nashtsai

issue is still on lastest version

I0411 04:43:43.554720   78820 net.go:24] whitelisted public IPs: [0.0.0.0/0]
I0411 04:43:43.554812   78820 net.go:36] ephemeral-port-range: 32768-60999
I0411 04:43:43.560298   78820 cilium.go:30] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct4_global: no such file or directory
I0411 04:43:43.560338   78820 cilium.go:36] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct6_global: no such file or directory
I0411 04:43:43.560354   78820 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v2: no such file or directory
I0411 04:43:43.560367   78820 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v3: no such file or directory
I0411 04:43:43.560382   78820 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v2: no such file or directory
I0411 04:43:43.560392   78820 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v3: no such file or directory
I0411 04:43:43.563772   78820 main.go:108] agent version: 1.23.12
I0411 04:43:43.563852   78820 main.go:114] hostname: localhost.localdomain
I0411 04:43:43.563861   78820 main.go:115] kernel version: 4.19.90-2107.6.0.0192.8.oe1.bclinux.aarch64
I0411 04:43:43.566753   78820 main.go:72] machine-id:  7fb0d43edf154cd3a974330c98b8cacb
I0411 04:43:43.566822   78820 tracing.go:40] OpenTelemetry traces collector endpoint:
I0411 04:43:43.566926   78820 otel.go:30] OpenTelemetry logs collector endpoint:
I0411 04:43:43.567135   78820 metadata.go:74] cloud provider:
I0411 04:43:43.567149   78820 collector.go:157] instance metadata: <nil>
I0411 04:43:43.567325   78820 profiling.go:55] profiles endpoint:
E0411 04:43:43.589643   78820 profiling.go:103] load bpf objects: field DisassociateCtty: program disassociate_ctty: map .rodata: map create: read- and write-only maps not supported (requires >= v5.2)
E0411 04:43:43.589643   78820 profiling.go:103] load bpf objects: field DisassociateCtty: program disassociate_ctty: map .rodata: map create: read- and write-only maps not supported (requires >= v5.2)
E0411 04:43:43.589643   78820 profiling.go:103] load bpf objects: field DisassociateCtty: program disassociate_ctty: map .rodata: map create: read- and write-only maps not supported (requires >= v5.2)
E0411 04:43:43.589643   78820 profiling.go:103] load bpf objects: field DisassociateCtty: program disassociate_ctty: map .rodata: map create: read- and write-only maps not supported (requires >= v5.2)
I0411 04:43:43.589882   78820 cgroup_linux.go:51] cgroup v2 root is /host/sys/fs/cgroup
W0411 04:43:47.594161   78820 registry.go:93] couldn't connect to containerd through the following UNIX sockets [/var/snap/microk8s/common/run/containerd.sock,/run/k0s/containerd.sock,/run/k3s/containerd/containerd.sock,/run/containerd/containerd.sock]: failed to dial "/proc/1/root/run/containerd/containerd.sock": context deadline exceeded
W0411 04:43:47.594161   78820 registry.go:93] couldn't connect to containerd through the following UNIX sockets [/var/snap/microk8s/common/run/containerd.sock,/run/k0s/containerd.sock,/run/k3s/containerd/containerd.sock,/run/containerd/containerd.sock]: failed to dial "/proc/1/root/run/containerd/containerd.sock": context deadline exceeded
I0411 04:43:47.594208   78820 crio.go:58] cri-o socket:
I0411 04:43:47.595520   78820 tracer.go:96] L7 tracing is disabled
E0411 04:43:48.200530   78820 tracer.go:213] load program: argument list too long:
E0411 04:43:48.200530   78820 tracer.go:213] load program: argument list too long:
E0411 04:43:48.200530   78820 tracer.go:213] load program: argument list too long:
E0411 04:43:48.200530   78820 tracer.go:213] load program: argument list too long:
F0411 04:43:48.200595   78820 main.go:146] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
F0411 04:43:48.200595   78820 main.go:146] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
F0411 04:43:48.200595   78820 main.go:146] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
F0411 04:43:48.200595   78820 main.go:146] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
F0411 04:43:48.200595   78820 main.go:146] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
[sw@localhost docker_compose]$ uname -a
Linux localhost.localdomain 4.19.90-2107.6.0.0192.8.oe1.bclinux.aarch64 #1 SMP Tue Mar 21 09:23:05 CST 2023 aarch64 aarch64 aarch64 GNU/Linux

donge avatar Apr 11 '25 04:04 donge

Given that Linux Kernel 4.19 is now in super long support and not having new features added I don't think support for 4.x kernels will be coming back. It'd be great if @def could confirm this though, and update the minimum requirements.

We have long since moved on to 5.x and 6.x kernels without issues.

FutureMatt avatar Apr 11 '25 07:04 FutureMatt

@FutureMatt, we'd love to fix that! But on the first attempt, we couldn’t find an easy way to reduce the number of instructions. So you're right, we should update the requirements in the docs. Would you be open to contributing to this? :) https://github.com/coroot/coroot/blob/main/docs/docs/installation/requirements.md

def avatar Apr 11 '25 07:04 def

Sure, I'm happy to but can you confirm what the minimum Kernel is now, is it just 5.x or are the requirements deeper than that?

FutureMatt avatar Apr 11 '25 07:04 FutureMatt

According to Cilium's docs:

The maximum instruction limit per program is restricted to 4096 BPF instructions, which, by design, means that any program will terminate quickly. For kernel newer than 5.1 this limit was lifted to 1 million BPF instructions.

def avatar Apr 11 '25 08:04 def