Launching profiler failure sometimes: timed out waiting for the condition

Open zhiyanliu opened this issue 5 years ago • 1 comments

I see the time of complete profiling execution is kind of unreasonable too long [0], and sometimes it works and outputs the svg result successfully however sometimes it failed during profiler launching with follow error (two separated issues?):

❯ time kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l java CONTAINER-NAME
Verifying target pod ... ✔
Launching profiler ... ❌
timed out waiting for the condition
kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l   0.29s user 0.15s system 0% cpu 5:02.61 total

At the moment I can see a timeout warning in the events of flame job description as following whatever the profiling success or failure, and the job pod will not be created:

Name:           kubectl-flame-2fde7132-33d5-4344-9305-9dd427128f7f
Namespace:      default
Selector:       controller-uid=026b12f1-a439-4acb-9650-1756a37d4435
Labels:         kubectl-flame/id=2fde7132-33d5-4344-9305-9dd427128f7f
Annotations:    sidecar.istio.io/inject: false
Parallelism:    1
Completions:    1
Pods Statuses:  0 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       controller-uid=026b12f1-a439-4acb-9650-1756a37d4435
                job-name=kubectl-flame-2fde7132-33d5-4344-9305-9dd427128f7f
                kubectl-flame/id=2fde7132-33d5-4344-9305-9dd427128f7f
  Annotations:  sidecar.istio.io/inject: false
  Containers:
   kubectl-flame:
    Image:      verizondigital/kubectl-flame:v0.1.5-jvm
    Port:       <none>
    Host Port:  <none>
    Command:
      /app/agent
    Args:
      2fde7132-33d5-4344-9305-9dd427128f7f
      13efad1b-6569-47db-9e19-be1b22e6e288
      CONTAINER-NAME
      docker://c7e037b70817e3e8fa8a49ce90c1e2d427aa05fd867f4ff8989916366b1b5580
      1m0s
      java
    Environment:  <none>
    Mounts:
      /var/lib/docker from target-filesystem (rw)
  Volumes:
   target-filesystem:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/docker
    HostPathType:
Events:
  Type     Reason        Age   From            Message
  ----     ------        ----  ----            -------
  Warning  FailedCreate  12s   job-controller  Error creating: Timeout: request did not complete within requested timeout

kubectl-flame version: Version: v0.1.5, Commit: 5cb73b3

kubernets version:

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T21:51:49Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-eks-2ba888", GitCommit:"2ba888155c7f8093a1bc06e3336333fbdb27b3da", GitTreeState:"clean", BuildDate:"2020-07-17T18:48:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

[0] The long time of a successful profiling (the total time is longer than above failure one but end of success):

time kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l java CONTAINER-NAME
Verifying target pod ... ✔
Launching profiler ... ✔
Profiling ... ✔
FlameGraph saved to: /tmp/flamegraph.svg 🔥
kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l   0.29s user 0.17s system 0% cpu 5:45.96 total

Any idea? Thanks.

Oct 04 '20 03:10 zhiyanliu

Hi @zhiyanliu Sorry for the late response. I haven't seen anything like that before, but from googling your error it looks like something in the Kubernetes cluster is preventing the job to launch. Do you have enough CPU / memory available on the targeted node? (The node where the application runs) Thanks.

Dec 16 '20 16:12 edeNFed