fix: add containerd support
This is a very simple fix/workaround for allowing profiling on clusters that use containerd as the container runtime.
The user has to provide the "docker-path" as the containerd runtime path.
There are of course more direct approaches to this, but this is the one that requires the least change in the current codebase. (also, the name docker-path becomes a bit meaningless)
tested this new agent image to be working on both dockerd clusters + containerd clusters.
let me know what you think, if necessary i can also adapt the implementation a bit.
closes #69
I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.
Great work so far! (I'm not a maintainer of this project – this might not help for getting this PR merged.)
I'm not really into Go, so this might be totally unrelated, but I'm getting an InvalidImageName error when I try your changes.
Here's what I see with on a JVM-based pod.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning InspectFailed 4s (x7 over 75s) kubelet Failed to apply default image tag "verizondigital/kubectl-flame:-jvm": couldn't parse image reference "verizondigital/kubectl-flame:-jvm": invalid reference format
Warning Failed 4s (x7 over 75s) kubelet Error: InvalidImageName
The problem seems to be related to the leading - in the image tag.
@sbaier1 Do you have an idea what's causing this problem?
I could work around this problem by providing the image path:
kubectl flamecontainerd mypod -t 1m --lang java --image verizondigital/kubectl-flame:v0.2.4-jvm --docker-path /run/containerd
I guess that this problem is not related to this PR.
Now, the log of the tell me the following
{"type":"progress","data":{"time":"2022-09-01T19:10:00.633813759Z","stage":"started"}}
{"type":"error","data":{"reason":"open /var/lib/docker/image/overlay2/layerdb/mounts/containerd://60f8811d44987c163e0392b3bef870b2652b63d3874c5d0f7b3e0f75779d012d/mount-id: no such file or directory"}}
And the pod immediately fails.
@sbaier1 Any ideas?
Now, the log of the tell me the following
{"type":"progress","data":{"time":"2022-09-01T19:10:00.633813759Z","stage":"started"}} {"type":"error","data":{"reason":"open /var/lib/docker/image/overlay2/layerdb/mounts/containerd://60f8811d44987c163e0392b3bef870b2652b63d3874c5d0f7b3e0f75779d012d/mount-id: no such file or directory"}}And the pod immediately fails.
@sbaier1 Any ideas?
This is fixed by my proposed change to filesystem.go.
Great to see this issue is being addressed. Is there any reason the pull request hasn't been merged yet?
Great to see this issue is being addressed. Is there any reason the pull request hasn't been merged yet?
it seems like this repo currently has no maintainers unfortunately, so no one who can actually merge it is reviewing the PR, so it can't be merged.
i'd be happy to jump back into it if someone maintaining it would respond, right now it seems like this project is just doomed overall
Very spooky to find an abandoned github repo on Halloween.
This is sad and unfortunate news but thanks for the quick reply.
A very spooky Halloween indeed :ghost:
We just upgraded our kubernetes clusters to use the containerd runtime instead of docker. Would love to see this MR merged to get support for this but the project being dead is unfortunate...
The maintainer in the readme is @edeNFed
@sbaier1 Well tagging the maintainer worked! But it looks like they also merged it in and the pipeline failed to generate a release :(
And my suggestions simply got ignored. :confused:
@QuinnBast which language are you trying to profile? The pipeline did manage to push the container images for JVM, JVM Alpine, BPF and Python. For example, the jvm image was pushed so you could use it with the command
kubectl flame mypod -t 1m --lang java --image verizondigital/kubectl-flame:v0.2.5-jvm --docker-path /run/containerd
(Note the tag, particularly the 5 in v0.2.5 .) If you're lucky, you won't need pvorb's suggested code changes.
This works because the code changes in this merged PR exclusively deal with these container images.
@pvorb It's a shame your suggestions got ignored but if it's any consolation, they were very helpful to me. I used them in my own repo fork, created my own image, finally got the agent pod running. (Too bad the resulting flame graph was empty.)
Thanks @benjaminxie! That command worked great. However, when running the command, the command gets stuck:
$ kubectl flame myPod -t 1m --lang java --image myRegistry/library/verizondigital/kubectl-flame:v0.2.5-jvm --docker-path /run/containerd Verifying target pod ... ✔ Launching profiler ... ✔ Profiling ...
The profiler pod does start and I shelled into the pod to find the flame graph at /tmp/flamegraph.svg, however, after copying the file to my local machine it appears that the flamegraph is empty.
Yes, exactly. @QuinnBast I've been struggling with problems like these as well, but couldn't solve them by now.
@QuinnBast @pvorb Someone else ran into this issue a while back (no data #73). Let's continue the conversation there. I think I may have some relevant observations.
Having an issue with updated image (kubectl-flame:v0.2.5-jvm) to support CONTAINERD. While the initial use stopped the outright failure, I am seeing a new issue to try to launch the flame pod with an exit code of 255. Looks like @pvorb requests to address the following are included:
mountId, err := ioutil.ReadFile(fileName)
if err != nil {
return "", err
}
But still seeing a 255 exit code ( See below ):
[XXXXXXXXX ~]$ kubectl describe pod kubectl-flame-2e6ee77f-a3d8-4a47-a8f7-4ecd8668abf6-c42wn -n digital
Name: kubectl-flame-2e6ee77f-a3d8-4a47-a8f7-4ecd8668abf6-c42wn
Namespace: digital
Priority: 0
Node: aks-genericdev2-29844296-vmss00000s/10.124.178.217
Start Time: Mon, 12 Dec 2022 15:25:36 -0500
Labels: controller-uid=78458314-1f8c-4fb1-8377-873d7c188388
job-name=kubectl-flame-2e6ee77f-a3d8-4a47-a8f7-4ecd8668abf6
kubectl-flame/id=2e6ee77f-a3d8-4a47-a8f7-4ecd8668abf6
Annotations: sidecar.istio.io/inject: false
Status: Failed
IP: 10.124.178.227
IPs:
IP: 10.124.178.227
Controlled By: Job/kubectl-flame-2e6ee77f-a3d8-4a47-a8f7-4ecd8668abf6
Containers:
kubectl-flame:
Container ID: containerd://0e85d4002079c1b3e16fde732d8fb161f9bf1b99ba67dcb1c64843531173401c
Image: verizondigital/kubectl-flame:v0.2.5-jvm
Image ID: docker.io/verizondigital/kubectl-flame@sha256:aa4eb0f6fc0bae768d1c558bca27fd645e8f08e89e91b4c19d891562935bdbfd
Port:
Normal Pulling 22s kubelet Pulling image "verizondigital/kubectl-flame:v0.2.5-jvm" Normal Pulled 22s kubelet Successfully pulled image "verizondigital/kubectl-flame:v0.2.5-jvm" in 107.829551ms Normal Created 22s kubelet Created container kubectl-flame Normal Started 22s kubelet Started container kubectl-flame XXXXXXXXXX ~]$ kubectl logs kubectl-flame-2e6ee77f-a3d8-4a47-a8f7-4ecd8668abf6-c42wn -n digital {"type":"progress","data":{"time":"2022-12-12T20:25:37.220351389Z","stage":"started"}} {"type":"error","data":{"reason":"exit status 255"}}
Has anyone resolved this or see the same ?
Running in AKS, with the following runtimes:
System Info: Machine ID: 88e8a329cbd84a22b389208937c90476 System UUID: 3da54c3e-3793-42e5-a949-caef8484533e Boot ID: 6bb95a01-f8c7-4c6b-a66e-70ff03cb2c8d Kernel Version: 5.4.0-1094-azure OS Image: Ubuntu 18.04.6 LTS Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.6.4+azure-4 Kubelet Version: v1.23.12 Kube-Proxy Version: v1.23.12
Can this be leveraged to work with crio-d?