opentelemetry-network
opentelemetry-network copied to clipboard
Failed to compile eBPF code for the Linux distro 'debian' running kernel version 6.5.0-1018-aws.
What happened?
Description
When installing ebpf, the collector kernel pod, although running, emits the following error:
2024-04-25 17:47:50.398732+00:00 debug [p:28721 t:28721] TCPChannel::connect: Conectando a la entrada @ opentelemetry-ebpf-reducer:7000 En el archivo incluido de .. /.. /.. /src/collector/kernel/bpf_src/render_bpf.c:39: En el archivo incluido de include/net/tcp.h:35: En el archivo incluido de include/net/sock_reuseport.h:5: En el archivo incluido de include/linux/filter.h:9: include/linux/bpf.h:321:10: Error: Aplicación no válida de 'sizeof' a un tipo incompleto 'struct bpf_rb_root' return sizeof(struct bpf_rb_root); ^ ~~~~~~~~~~~~~~~~~~~~ include/linux/bpf.h:321:24: Nota: declaración directa de 'struct bpf_rb_root' return sizeof(struct bpf_rb_root); ^ include/linux/bpf.h:323:10: Error: Aplicación no válida de 'sizeof' a un tipo incompleto 'struct bpf_rb_node' return sizeof(struct bpf_rb_node);
Important that the following command was run before installation:
sudo apt-get install --yes linux-headers-$(uname -r)
Kernel version: Linux show-no-config-i-05bbcdabc7509e781 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Steps to Reproduce
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts helm repo update open-telemetry helm install my-opentelemetry-ebpf -f ./otel-ebpf-values.yaml open-telemetry/opentelemetry-ebpf check logs of kernel collector pod
Expected Result
transmission of metrics
Actual Result
Errors in data collection.
eBPF Collector version
latest
Environment information
Environment
Kernel version: Linux show-no-config-i-05bbcdabc7509e781 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy
eBPF Collector configuration
# Default values for opentelemetry-ebpf.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
nameOverride: ""
fullnameOverride: ""
clusterName: "demohebnpm"
image:
tag: ""
registry: otel
pullPolicy: IfNotPresent
imagePullSecrets: []
resources: {}
# OTLP gRPC endpoint to send the collected metrics
endpoint:
address: "0.0.0.0"
port: 4317
log:
console: true
# possible values: { error | warning | info | debug | trace }
level: debug
debug:
enabled: true
storeMinidump: false
sendUnplannedExitMetric: false
kernelCollector:
enabled: true
serviceAccount:
create: true
name: ""
image:
registry: ""
tag: ""
name: opentelemetry-ebpf-kernel-collector
nodeSelector: {}
disableHttpMetrics: false
tolerations:
- operator: "Exists"
effect: "NoExecute"
- operator: "Exists"
effect: "NoSchedule"
affinity: {}
resources: {}
# uncomment the line below to disable automatic kernel headers fetching
fetchKernelHeaders: true
# uncomment to enable enrichment using Docker metadata
useDockerMetadata: true
# uncomment to enable enrichment using Nomad metadata (https://www.nomadproject.io/)
collectNomadMetadata: true
cloudCollector:
enabled: false
image:
registry: ""
tag: ""
name: opentelemetry-ebpf-cloud-collector
serviceAccount:
create: true
name: ""
annotations: {}
## eks.amazonaws.com/role-arn: "role-arn-name"
tolerations: []
affinity: {}
k8sCollector:
enabled: true
serviceAccount:
create: true
name: ""
relay:
image:
registry: ""
tag: ""
name: opentelemetry-ebpf-k8s-relay
watcher:
image:
registry: ""
tag: ""
name: opentelemetry-ebpf-k8s-watcher
tolerations: []
affinity: {}
reducer:
image:
registry: ""
tag: ""
name: opentelemetry-ebpf-reducer
extraArgs: {}
ingestShards: 1
matchingShards: 1
aggregationShards: 1
disableInternalMetrics: true
disableMetrics: []
### to disable an entire metric category: ###
# - tcp.all
# - udp.all
# - dns.all
# - http.all
### to disable an individual metric: ###
### tcp ###
# - tcp.bytes
# - tcp.rtt.num_measurements
# - tcp.active
# - tcp.rtt.average
# - tcp.packets
# - tcp.retrans
# - tcp.syn_timeouts
# - tcp.new_sockets
# - tcp.resets
### udp ###
# - udp.bytes
# - udp.packets
# - udp.active
# - udp.drops
### dns ###
# - dns.client.duration.average
# - dns.server.duration.average
# - dns.active_sockets
# - dns.responses
# - dns.timeouts
### http ##
# - http.client.duration.average
# - http.server.duration.average
# - http.active_sockets
# - http.status_code
### ebpf_net ##
# - ebpf_net.span_utilization_fraction
# - ebpf_net.pipeline_metric_bytes_discarded
# - ebpf_net.codetiming_min_ns
# - ebpf_net.entrypoint_info
# - ebpf_net.otlp_grpc.requests_sent
# - ebpf_net.connections
# - ebpf_net.rpc_queue_elem_utilization_fraction
# - ebpf_net.disconnects
# - ebpf_net.codetiming_avg_ns
# - ebpf_net.client_handle_pool
# - ebpf_net.otlp_grpc.successful_requests
# - ebpf_net.span_utilization
# - ebpf_net.up
# - ebpf_net.rpc_queue_buf_utilization_fraction
# - ebpf_net.collector_log_count
# - ebpf_net.time_since_last_message_ns
# - ebpf_net.bpf_log
# - ebpf_net.codetiming_count
# - ebpf_net.message
# - ebpf_net.otlp_grpc.bytes_sent
# - ebpf_net.pipeline_message_error
# - ebpf_net.pipeline_metric_bytes_written
# - ebpf_net.codetiming_max_ns
# - ebpf_net.codetiming_sum_ns
# - ebpf_net.otlp_grpc.failed_requests
# - ebpf_net.rpc_queue_buf_utilization
### to enable all metrics (including metrics turned off by default): ###
# - none
enableMetrics: []
### Disable metrics flag is evaluated first and only then enable metric flag is evaluated. ###
### to enable an entire metric category: ###
# - tcp.all
# - udp.all
# - dns.all
# - http.all
# - ebpf_net.all
### to enable an individual metric: ###
### tcp ###
# - tcp.bytes
# - tcp.rtt.num_measurements
# - tcp.active
# - tcp.rtt.average
# - tcp.packets
# - tcp.retrans
# - tcp.syn_timeouts
# - tcp.new_sockets
# - tcp.resets
### udp ###
# - udp.bytes
# - udp.packets
# - udp.active
# - udp.drops
### dns ###
# - dns.client.duration.average
# - dns.server.duration.average
# - dns.active_sockets
# - dns.responses
# - dns.timeouts
### http ###
# - http.client.duration.average
# - http.server.duration.average
# - http.active_sockets
# - http.status_code
### ebpf_net ###
# - ebpf_net.span_utilization_fraction
# - ebpf_net.pipeline_metric_bytes_discarded
# - ebpf_net.codetiming_min_ns
# - ebpf_net.entrypoint_info
# - ebpf_net.otlp_grpc.requests_sent
# - ebpf_net.connections
# - ebpf_net.rpc_queue_elem_utilization_fraction
# - ebpf_net.disconnects
# - ebpf_net.codetiming_avg_ns
# - ebpf_net.client_handle_pool
# - ebpf_net.otlp_grpc.successful_requests
# - ebpf_net.span_utilization
# - ebpf_net.rpc_queue_elem_utilization_fraction
# - ebpf_net.disconnects
# - ebpf_net.codetiming_avg_ns
# - ebpf_net.client_handle_pool
# - ebpf_net.otlp_grpc.successful_requests
# - ebpf_net.span_utilization
# - ebpf_net.up
# - ebpf_net.rpc_queue_buf_utilization_fraction
# - ebpf_net.collector_log_count
# - ebpf_net.time_since_last_message_ns
# - ebpf_net.bpf_log
# - ebpf_net.codetiming_count
# - ebpf_net.message
# - ebpf_net.otlp_grpc.bytes_sent
# - ebpf_net.pipeline_message_error
# - ebpf_net.pipeline_metric_bytes_written
# - ebpf_net.codetiming_max_ns
# - ebpf_net.span_utilization_max
# - ebpf_net.client_handle_pool_fraction
# - ebpf_net.span_utilization_fraction
# - ebpf_net.rpc_latency_ns
# - ebpf_net.agg_root_truncation
# - ebpf_net.clock_offset_ns
# - ebpf_net.otlp_grpc.metrics_sent
# - ebpf_net.otlp_grpc.unknown_response_tags
# - ebpf_net.collector_health
# - ebpf_net.codetiming_sum_ns
# - ebpf_net.otlp_grpc.failed_requests
# - ebpf_net.rpc_queue_buf_utilization
resources: {}
nodeSelector: {}
tolerations: []
affinity: {}
service:
type: ClusterIP
ports:
telemetry:
enabled: true
servicePort: 7000
containerPort: 7000
targetPort: 7000
protocol: TCP
appProtocol: http
stats:
enabled: true
servicePort: 7001
containerPort: 7001
targetPort: 7001
protocol: TCP
appProtocol: http
rbac:
create: true
Log output
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/srv
SHLVL=0
SSL_CERT_DIR=/etc/ssl/certs
_=/usr/bin/env
===========================================================
resolving kernel headers...
cleaning up stale kprobes...
launching kernel collector...
+ exec /srv/kernel-collector --host-distro debian --kernel-headers-source pre_installed --config-file=/etc/network-explorer/config.yaml --force-docker-metadata --log-console --debug
2024-04-25 17:47:41.000682+00:00 debug [p:28721 t:28721] setting up breakpad...
2024-04-25 17:47:41.000794+00:00 debug [p:28721 t:28721] setting up breakpad...
2024-04-25 17:47:41.000909+00:00 info [p:28721 t:28721] Starting Kernel Collector version 0.10.0 (release)
2024-04-25 17:47:41.000921+00:00 info [p:28721 t:28721] Kernel Collector agent ID is FAIDIN4D2V25Q0YAXWQK8F1QSLM5FG688BQB
2024-04-25 17:47:41.000925+00:00 info [p:28721 t:28721] Running on:
sysname: Linux
nodename: show-no-config-i-05bbcdabc7509e781
release: 6.5.0-1018-aws
version: #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024
machine: x86_64
2024-04-25 17:47:41.000947+00:00 info [p:28721 t:28721] HTTP Metrics: Enabled
2024-04-25 17:47:41.000949+00:00 info [p:28721 t:28721] Socket stats interval in seconds: 10
2024-04-25 17:47:41.000950+00:00 info [p:28721 t:28721] Userland TCP: Disabled
2024-04-25 17:47:41.007377+00:00 debug [p:28721 t:28721] Unable to fetch AWS metadata: no metadata returned by AWS
2024-04-25 17:47:41.019944+00:00 debug [p:28721 t:28721] Unable to fetch GCP metadata: error while fetching Google Cloud Platform instance metadata: Could not resolve host: metadata.google.internal
2024-04-25 17:47:41.019960+00:00 debug [p:28721 t:28721] Unable to fetch Nomad metadata - environment variables not found
2024-04-25 17:47:41.019970+00:00 info [p:28721 t:28721] Kernel Collector version 0.10.0 (release) started on host show-no-config-i-05bbcdabc7509e781
2024-04-25 17:47:41.020086+00:00 info [p:28721 t:28721] Node label has been set in config: 'environment':'demohebnpm'
2024-04-25 17:47:41.047126+00:00 debug [p:28721 t:28721] intake record file: ``
2024-04-25 17:47:41.047191+00:00 debug [p:28721 t:28721] starting event loop...
2024-04-25 17:47:50.398714+00:00 info [p:28721 t:28721] connecting to opentelemetry-ebpf-reducer:7000 (binary)...
2024-04-25 17:47:50.398732+00:00 debug [p:28721 t:28721] TCPChannel::connect: Connecting to intake @ opentelemetry-ebpf-reducer:7000
In file included from ../../../src/collector/kernel/bpf_src/render_bpf.c:39:
In file included from include/net/tcp.h:35:
In file included from include/net/sock_reuseport.h:5:
In file included from include/linux/filter.h:9:
include/linux/bpf.h:321:10: error: invalid application of 'sizeof' to an incomplete type 'struct bpf_rb_root'
return sizeof(struct bpf_rb_root);
^ ~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:321:24: note: forward declaration of 'struct bpf_rb_root'
return sizeof(struct bpf_rb_root);
^
include/linux/bpf.h:323:10: error: invalid application of 'sizeof' to an incomplete type 'struct bpf_rb_node'
return sizeof(struct bpf_rb_node);
^ ~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:323:24: note: forward declaration of 'struct bpf_rb_node'
return sizeof(struct bpf_rb_node);
^
include/linux/bpf.h:325:10: error: invalid application of 'sizeof' to an incomplete type 'struct bpf_refcount'
return sizeof(struct bpf_refcount);
^ ~~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:325:24: note: forward declaration of 'struct bpf_refcount'
return sizeof(struct bpf_refcount);
^
include/linux/bpf.h:347:10: error: invalid application of '__alignof' to an incomplete type 'struct bpf_rb_root'
return __alignof__(struct bpf_rb_root);
^ ~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:347:29: note: forward declaration of 'struct bpf_rb_root'
return __alignof__(struct bpf_rb_root);
^
include/linux/bpf.h:349:10: error: invalid application of '__alignof' to an incomplete type 'struct bpf_rb_node'
return __alignof__(struct bpf_rb_node);
^ ~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:349:29: note: forward declaration of 'struct bpf_rb_node'
return __alignof__(struct bpf_rb_node);
^
include/linux/bpf.h:351:10: error: invalid application of '__alignof' to an incomplete type 'struct bpf_refcount'
return __alignof__(struct bpf_refcount);
^ ~~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:351:29: note: forward declaration of 'struct bpf_refcount'
return __alignof__(struct bpf_refcount);
^
../../../src/collector/kernel/bpf_src/tcp-processor/bpf_tcp_send_recv.h:184:53: error: no member named 'iov' in 'struct iov_iter'
bpf_probe_read(&iov, sizeof(iov), &(msg->msg_iter.iov));
~~~~~~~~~~~~~ ^
../../../src/collector/kernel/bpf_src/tcp-processor/bpf_tcp_send_recv.h:393:53: error: no member named 'iov' in 'struct iov_iter'
bpf_probe_read(&iov, sizeof(iov), &(msg->msg_iter.iov));
~~~~~~~~~~~~~ ^
8 errors generated.
2024-04-25 17:47:56.205695+00:00 error [p:28721 t:28721] Cannot initialize BPF program, res=-1
Failed to compile eBPF code for the Linux distro 'debian' running kernel version 6.5.0-1018-aws.
troubleshoot item bpf_compilation_failed (os=Linux,flavor=debian,headers_src=pre_installed,kernel=6.5.0-1018-aws): ProbeHandler couldn't load BPFModule: Success
This usually means that kernel headers weren't installed correctly.
Please reach out to support and include this log in its entirety so we can diagnose and fix
the problem.
In the meantime, please install kernel headers manually on each host before running
the Kernel Collector.
To manually install kernel headers, follow the instructions below:
- for Debian/Ubuntu based distros, run:
sudo apt-get install --yes "linux-headers-`uname -r`"
- for RedHat based distros like CentOS and Amazon Linux, run:
sudo yum install -y "kernel-devel-`uname -r`"
Additional context
No response
The first set of errors (include/linux/bpf.h), at first glance, could be due to some internal inconsistency in the kernel headers. For example take the first error:
- The error location seems to be here.
- the first #include in that file is to uapi bpf.h
- uapi bpf.h has a (non-forward) definition
so there should be a full definition -- curious.
@ccoqueiro would the package repository used to install the packages contain recent versions of the headers? Is the kernel on that machine a recent release in the distro?
The two errors in bpf_tcp_send_recv.h:
- v6.5 definition of msghdr
-
msg->msg_iter
is astruct iov_iter
, defined here - Some digging in git history shows commit de4f5fed3f231
- const struct iovec *iov;
+ /* use iter_iov() to get the current vec */
+ const struct iovec *__iov;
- Seems to have originated in v6.4:
$ git describe --contains de4f5fed3f231
v6.4-rc1~214^2~10
So we'd want to figure out what Iter_iov() does and handle the modified structure with an #if LINUX_VERSION_CODE < KERNEL_VERSION(6, 4, 0)
(edit: the <
case would contain old code, and the #else
for the new)
Hello @yonch , I understand that yes, I'm using the chart opentelemetry ebpf package -> https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-ebpf
@ccoqueiro I'm wondering if the header package might somehow be old/broken, is one of these true in your case:
- The package repo for the distro (used by apt) is not standard
- The kernel header package was installed a long time ago and not updated
- The machine is running a bleeding edge kernel for the distro (so the header packaging might be work-in-progress)
and if the answer is no, a couple of things to try:
- updating the packages on the system
apt-get upgrade
, see if that fixes the headers - running on a machine that does not have headers (e.g., without first running
sudo apt-get install --yes linux-headers-$(uname -r)
, so letting the network collector fetch its own headers
note that these will probably only fix the first set of errors. The second set requires modifications in the eBPF code. Are you in a position to pursue those, or should we search for community contributors?
Hello @yonch
Answering questions:
- The package repo for the distro (used by apt) is not standard. The distro I used is an ubuntu 22.04 provided by AWS, I understand it's standard.
- The kernel header package was installed a long time ago and not up. The kernel header package was not installed, I installed it as a prerequisite for the installation of otel ebpf.
- The machine is running a bleeding edge kernel for the distro (so the header packaging might be work-in-progress) .I can't answer this question, how could we check this?
updating the packages on the system apt-get upgrade, see if that fixes the headers. Done but not fixed the headers. running on a machine that does not have headers (e.g., without first running sudo apt-get install --yes linux-headers-$(uname -r), so letting the network collector fetch its own headers. I ran this command, installing the package reader before installing the ebpf otel, but it didn't help, it kept giving the same error.
The second set requires modifications in the eBPF code. Are you in a position to pursue those, or should we search for community contributors? To be quite honest with you, I have no idea how I would do this.
Got it @ccoqueiro, I marked with "help wanted" and will direct contributors here if asked. I'm sorry I don't have anything more immediate for you. If you find anyone who would like to tackle, happy to work with them!