Launching profiler failure sometimes: timed out waiting for the condition
I see the time of complete profiling execution is kind of unreasonable too long [0], and sometimes it works and outputs the svg result successfully however sometimes it failed during profiler launching with follow error (two separated issues?):
❯ time kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l java CONTAINER-NAME
Verifying target pod ... ✔
Launching profiler ... ❌
timed out waiting for the condition
kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l 0.29s user 0.15s system 0% cpu 5:02.61 total
At the moment I can see a timeout warning in the events of flame job description as following whatever the profiling success or failure, and the job pod will not be created:
Name: kubectl-flame-2fde7132-33d5-4344-9305-9dd427128f7f
Namespace: default
Selector: controller-uid=026b12f1-a439-4acb-9650-1756a37d4435
Labels: kubectl-flame/id=2fde7132-33d5-4344-9305-9dd427128f7f
Annotations: sidecar.istio.io/inject: false
Parallelism: 1
Completions: 1
Pods Statuses: 0 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=026b12f1-a439-4acb-9650-1756a37d4435
job-name=kubectl-flame-2fde7132-33d5-4344-9305-9dd427128f7f
kubectl-flame/id=2fde7132-33d5-4344-9305-9dd427128f7f
Annotations: sidecar.istio.io/inject: false
Containers:
kubectl-flame:
Image: verizondigital/kubectl-flame:v0.1.5-jvm
Port: <none>
Host Port: <none>
Command:
/app/agent
Args:
2fde7132-33d5-4344-9305-9dd427128f7f
13efad1b-6569-47db-9e19-be1b22e6e288
CONTAINER-NAME
docker://c7e037b70817e3e8fa8a49ce90c1e2d427aa05fd867f4ff8989916366b1b5580
1m0s
java
Environment: <none>
Mounts:
/var/lib/docker from target-filesystem (rw)
Volumes:
target-filesystem:
Type: HostPath (bare host directory volume)
Path: /var/lib/docker
HostPathType:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 12s job-controller Error creating: Timeout: request did not complete within requested timeout
kubectl-flame version:
Version: v0.1.5, Commit: 5cb73b3
kubernets version:
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T21:51:49Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-eks-2ba888", GitCommit:"2ba888155c7f8093a1bc06e3336333fbdb27b3da", GitTreeState:"clean", BuildDate:"2020-07-17T18:48:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
[0] The long time of a successful profiling (the total time is longer than above failure one but end of success):
time kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l java CONTAINER-NAME
Verifying target pod ... ✔
Launching profiler ... ✔
Profiling ... ✔
FlameGraph saved to: /tmp/flamegraph.svg 🔥
kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l 0.29s user 0.17s system 0% cpu 5:45.96 total
Any idea? Thanks.
Hi @zhiyanliu Sorry for the late response. I haven't seen anything like that before, but from googling your error it looks like something in the Kubernetes cluster is preventing the job to launch. Do you have enough CPU / memory available on the targeted node? (The node where the application runs) Thanks.