opentelemetry-go-instrumentation icon indicating copy to clipboard operation
opentelemetry-go-instrumentation copied to clipboard

when trying go auto instrumentation I got process not found yet

Open msherif1234 opened this issue 1 year ago • 6 comments

Describe the bug

Not sure how to make my go app visible to instrumentation pod

Environment

running on OCP cluster

To Reproduce

Steps to reproduce the behavior:

  1. install cert-manager kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.yaml
  2. deploy optel operator kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
  3. create optel collector object
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: demo
  namespace: default
spec:
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 15
      batch:
        send_batch_size: 10000
        timeout: 10s

    exporters:
      # NOTE: Prior to v0.86.0 use  instead of .
      debug:

    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [debug]
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [debug]
        logs:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [debug]
  mode: daemonset
  1. create instrumentation object
kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: demo-instrumentation
spec:
  exporter:
    endpoint: http://demo-collector:4317
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "1"
EOF
  1. using https://github.com/netobserv/network-observability-operator/pull/500 PR to hack the netobserv operator and enable auto instrumentation for now we need to set OTEL_EXPORTER_OTLP_ENDPOINT manually to match optel svcIP then compile make image-build then make image-push then deploy operator USER=username VERSION="main-amd64" make deploy

  2. create netobserv flow collector oc create -f config/samples/flows_v1beta2_flowcollector.yaml

  3. we should see netobserv agent pods now running with two containers with new one as sidecar for instrumentation

oc get pods -n netobserv-privileged
NAME                         READY   STATUS    RESTARTS   AGE
netobserv-ebpf-agent-2msml   2/2     Running   0          24m
netobserv-ebpf-agent-7grl5   2/2     Running   0          24m
netobserv-ebpf-agent-8pgwj   2/2     Running   0          24m
netobserv-ebpf-agent-n9s6q   2/2     Running   0          24m
netobserv-ebpf-agent-trq4b   2/2     Running   0          24m
netobserv-ebpf-agent-whqxs   2/2     Running   0          24m

Expected behavior

I was expected to instrumentation container to find the app binary and start emitting some form of metrics but I am getting

{"level":"info","ts":1700489757.6685278,"logger":"Instrumentation.Analyzer","caller":"process/discover.go:73","msg":"process not found yet, trying again soon","exe_path":"/netobserv-ebpf-agent"}

Additional context

Used instructions doc here https://opentelemetry.io/docs/kubernetes/operator/automatic/

msherif1234 avatar Nov 20 '23 14:11 msherif1234

Can you double-check if the Go instrumentation and application containers share the process namespace?

Reference:

  • https://github.com/open-telemetry/opentelemetry-go-instrumentation#instrument-an-application-in-kubernetes
  • https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/

pellared avatar Nov 20 '23 15:11 pellared

Can you double-check if the Go instrumentation and application containers share the process namespace?

Reference:

  • https://github.com/open-telemetry/opentelemetry-go-instrumentation#instrument-an-application-in-kubernetes
  • https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/ Thanks @pellared that was it can u pls share with me a way to see those instrumentations ? I tried where 172.30.140.13 is the clusterIP svc
[root@ci-ln-8hfrsd2-72292-c84sf-worker-a-g7qpd /]# grpcurl -plaintext 172.30.140.13:4317 list
Failed to list services: server does not support the reflection API

this is what I set

Name:  "OTEL_EXPORTER_OTLP_ENDPOINT",
Value: "172.30.140.13:4317",

this is what I see in the container logs

{"level":"info","ts":1700514825.3263397,"logger":"Instrumentation.Controller","caller":"opentelemetry/controller.go:54","msg":"got event","attrs":[{"Key":"net.peer.port","Value":{"Type":"STRING","Value":"2055"}},{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":"/pbflow.Collector/Send"}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":"10.0.128.4"}}]}
2023/11/20 21:13:45 traces export: Post "https://localhost:4318/v1/traces": dial tcp [::1]:4318: connect: connection refused

where 10.0.128.4 is the podIP

msherif1234 avatar Nov 20 '23 20:11 msherif1234

Hi everyone! I am having the same issue when instrumenting Go using the operator. {"level":"info","ts":1705690630.9377563,"logger":"Instrumentation.Analyzer","caller":"process/discover.go:73","msg":"process not found yet, trying again soon","exe_path":"/app"}

I am using the following autoinstrumentation library:

ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.10.1-alpha

I can confirm the pods have the config:

shareProcessNamespace: true

I can also confirm that the container gets injected with the following attribute:

securityContext: privileged: true runAsUser: 0

Am I missing something? Thanks in advance!

lel-war avatar Jan 22 '24 14:01 lel-war

Hi everyone! I am having the same issue when instrumenting Go using the operator. {"level":"info","ts":1705690630.9377563,"logger":"Instrumentation.Analyzer","caller":"process/discover.go:73","msg":"process not found yet, trying again soon","exe_path":"/app"}

I am using the following autoinstrumentation library:

ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.10.1-alpha

I can confirm the pods have the config:

shareProcessNamespace: true

I can also confirm that the container gets injected with the following attribute:

securityContext: privileged: true runAsUser: 0

Am I missing something? Thanks in advance!

@lel-war Are you using OTEL_GO_AUTO_TARGET_EXE or instrumentation.opentelemetry.io/otel-go-auto-target-exe? Is your go executable full path /app (as seen to passed to the instrumentation in the log you attached)

RonFed avatar Jan 22 '24 17:01 RonFed

Hi @RonFed thanks for the quick response. To answer your question I am using the following annotation:

instrumentation.opentelemetry.io/otel-go-auto-target-exe: /app

The value "/app" is just an example of the real application, in reality it looks more like /home/user/app. So to answer your question, yes!

lel-war avatar Jan 22 '24 17:01 lel-war

Hello there!

Out of curiosity, did you find any solution?

I'm having the same issue.

I created a debug/ephemeral container in order to verify the path of the executable and it seems to be the correct one.

  • ShareProcessNamespace is enabled
  • Security Context capabilities adds SYS_PTRACE
  • Security Context privileged is true

Am I missing something? Do you have any idea?

Morsicus avatar Jul 02 '24 21:07 Morsicus