eBPF: Matching of PID fails in nested namespace deployment
Describe the bug
In my development environment I am not able to receive eBPF profiles and if I do they are only for small subset of the expected workload and the metadata is not matching with the stacktraces seen.
This is because I run the kind development environment in nested containers
- L1 A VM for getting linux and docker containers
- L2 This VM runs a single container per Kubernetes node with containerd
- L3 Then this containerd instance will run more nested containers as the Kubernetes workload requires
Now when I install the Grafana Agent via Daemon-set (running in [L3]), it will mount the host /proc of machine L2, with the pid namespace of L2.
When now the pyroscope.ebpf component collects profiles, it will actually see the pid namespace of L1.
To Reproduce
Steps to reproduce the behaviour:
# Create a kind cluster:
$ kind create cluster
# Configure grafana agent for ebpf
$ cat values-ebpf.yaml <<"EOF"
agent:
agent:
mode: 'flow'
configMap:
name: pyroscope-agent
create: true
content: |
discovery.kubernetes "local_pods" {
selectors {
field = "spec.nodeName=" + env("HOSTNAME")
role = "pod"
}
role = "pod"
}
discovery.process "all" {
join = discovery.kubernetes.local_pods.targets
refresh_interval = "60s"
discover_config {
cwd = true
exe = true
commandline = true
username = false
uid = true
container_id = true
}
}
discovery.relabel "all" {
targets = discovery.process.all.targets
// get basename into service_name
rule {
source_labels = ["__meta_process_exe"]
action = "replace"
regex = ".*/(.*)$"
target_label = "service_name"
replacement = "$1"
}
}
pyroscope.ebpf "instance" {
forward_to = [pyroscope.write.endpoint.receiver]
targets = discovery.relabel.all.output
}
pyroscope.write "endpoint" {
endpoint {
url = "http://pyroscope:4040"
}
}
securityContext:
runAsGroup: 0
runAsUser: 0
privileged: true
controller:
type: "daemonset"
hostPID: true
image:
repository: grafana/agent
tag: "main-6eb1889"
EOF
# Deploy pyroscope / agent via helm
$ helm upgrade --install --version v1.4.0 --values values-ebpf.yaml pyroscope grafana/pyroscope
Expected behavior
Profiling information is collected in eBPF as per PID namespace of L2 (the namespace Grafana Agent runs in)
Environment
- Kind on Darwin/arm64
- Helm Chart v1.4.0
Additional Context
https://stackoverflow.com/questions/48401989/how-can-i-determine-which-namespaces-a-pid-is-in-from-kernel-space
Debug info
All targets going in as expected
Logs
ts=2024-02-14T11:24:55.99393905Z level=info "boringcrypto enabled"=false
ts=2024-02-14T11:24:56.01403553Z level=info msg="starting complete graph evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270
ts=2024-02-14T11:24:56.014288156Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=pyroscope.write.endpoint duration=219.126µs
ts=2024-02-14T11:24:56.014304822Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=otel duration=1.833µs
ts=2024-02-14T11:24:56.014467614Z level=info msg="Using pod service account via in-cluster config" component=discovery.kubernetes.local_pods
ts=2024-02-14T11:24:56.014807324Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=discovery.kubernetes.local_pods duration=489.46µs
ts=2024-02-14T11:24:56.014795199Z level=info msg="running usage stats reporter"
ts=2024-02-14T11:24:56.014842657Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=discovery.process.all duration=22.333µs
ts=2024-02-14T11:24:56.014917366Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=discovery.relabel.all duration=66µs
ts=2024-02-14T11:24:56.015216158Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=pyroscope.ebpf.instance duration=290.126µs
ts=2024-02-14T11:24:56.0152447Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=tracing duration=6.792µs
ts=2024-02-14T11:24:56.015260492Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=logging duration=11.584µs
ts=2024-02-14T11:24:56.015273867Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=labelstore duration=5.042µs
ts=2024-02-14T11:24:56.015328451Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=remotecfg duration=44.126µs
ts=2024-02-14T11:24:56.015341951Z level=info msg="applying non-TLS config to HTTP server" service=http
ts=2024-02-14T11:24:56.015344576Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=http duration=7.292µs
ts=2024-02-14T11:24:56.015380367Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=ui duration=32.584µs
ts=2024-02-14T11:24:56.015390617Z level=info msg="finished node evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 node_id=cluster duration=3.292µs
ts=2024-02-14T11:24:56.015394242Z level=info msg="finished complete graph evaluation" controller_id="" trace_id=5c4ab5999cda10805ef4dfd85e94c270 duration=1.446838ms
ts=2024-02-14T11:24:56.015555618Z level=info msg="scheduling loaded components and services"
ts=2024-02-14T11:24:56.01567341Z level=info msg="finished node evaluation" controller_id="" node_id=pyroscope.ebpf.instance duration=72.584µs
ts=2024-02-14T11:24:56.01573716Z level=info msg="finished node evaluation" controller_id="" node_id=pyroscope.ebpf.instance duration=51.959µs
ts=2024-02-14T11:24:56.016234453Z level=info msg="now listening for http traffic" service=http addr=0.0.0.0:80
ts=2024-02-14T11:24:56.020310008Z level=info msg="finished node evaluation" controller_id="" node_id=discovery.relabel.all duration=157.375µs
ts=2024-02-14T11:24:56.043247746Z level=info msg="starting cluster node" peers="" advertise_addr=10.244.0.11:80
ts=2024-02-14T11:24:56.04346858Z level=info msg="peers changed" new_peers=pyroscope-agent-hjwr8
ts=2024-02-14T11:24:56.1087922Z level=info msg="finished node evaluation" controller_id="" node_id=pyroscope.ebpf.instance duration=88.461693ms
ts=2024-02-14T11:25:01.016362672Z level=info msg="finished node evaluation" controller_id="" node_id=discovery.process.all duration=105.875µs
ts=2024-02-14T11:25:01.017189508Z level=info msg="finished node evaluation" controller_id="" node_id=discovery.relabel.all duration=648.252µs
ts=2024-02-14T11:25:01.017363758Z level=info msg="finished node evaluation" controller_id="" node_id=pyroscope.ebpf.instance duration=93.667µs
ts=2024-02-14T11:25:06.016382515Z level=info msg="finished node evaluation" controller_id="" node_id=discovery.process.all duration=169.959µs
ts=2024-02-14T11:25:06.017415435Z level=info msg="finished node evaluation" controller_id="" node_id=discovery.relabel.all duration=644.71µs
ts=2024-02-14T11:25:06.017628685Z level=info msg="finished node evaluation" controller_id="" node_id=pyroscope.ebpf.instance duration=177.834µs
ts=2024-02-14T11:25:56.034811418Z level=info msg="finished node evaluation" controller_id="" node_id=discovery.relabel.all duration=1.905547ms
ts=2024-02-14T11:25:56.035494295Z level=info msg="finished node evaluation" controller_id="" node_id=pyroscope.ebpf.instance duration=502.335µs
ts=2024-02-14T11:25:56.045259784Z level=info msg="rejoining peers" peers=10-244-0-11.pyroscope-agent-cluster.default.svc.cluster.local.:80
Thanks fore detailed report. These two may be related/dupes. https://github.com/grafana/pyroscope/issues/1994
Last time I tried to use kind, they did not support -pid=host (ignored it)
I will try to look into it again.