ingress-nginx icon indicating copy to clipboard operation
ingress-nginx copied to clipboard

Cannot see OpenTelemetry traces

Open agronholm opened this issue 2 years ago • 16 comments
trafficstars

What happened:

I upgraded to ingress-nginx 1.7.0 and Helm chart 4.6.0, and enabled OpenTelemetry configuration, to the best of my knowledge. The following represent the custom values I entered:

controller:
  opentelemetry:
    enabled: true

  config:
    otlp-collector-host: $HOST_IP
    enable-opentelemetry: "true"
    enable-brotli: "true"
  extraEnvs:
  - name: HOST_IP
    valueFrom:
      fieldRef:
        fieldPath: status.hostIP

Firstly, without that first controller.opentelemetry.enabled: true option, it would fail to load the otel dynamic library. I didn't initially see this documented anywhere, but a kind soul on the OTEL feature ticket told me to add that.

What you expected to happen:

I expected to start seeing traces from the nginx service on my traces feed. I have both Datadog and SignOz hooked up to the OTEL collector (which runs as a daemonset), but I haven't seen any nginx traces anywhere yet. No errors either.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

nginx-ingress-nginx-controller-56675749b6-pwfn5:/etc/nginx$ nginx-ingress-controller --version
bash: nginx-ingress-controller: command not found

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.7", GitCommit:"723bcdb232300aaf5e147ff19b4df7ec8a20278d", GitTreeState:"archive", BuildDate:"2023-03-02T00:00:00Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"34f89fd3fb1a106e1b23d3454b2f2cbf305602a1", GitTreeState:"clean", BuildDate:"2023-02-10T14:48:59Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: Azure AKS

  • OS (e.g. from /etc/os-release):

  • Kernel (e.g. uname -a):

  • Install tools: AKS managed

    • Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
  • Basic cluster related info:

    • kubectl version
    • kubectl get nodes -o wide
  • How was the ingress-nginx-controller installed:

    • If helm was used then please show output of helm ls -A | grep -i ingress
nginx             	nginx         	12      	2023-03-27 16:29:03.278334939 +0300 EEST	deployed	ingress-nginx-4.6.0           	1.7.0 
  • If helm was used then please show output of helm -n <ingresscontrollernamepspace> get values <helmreleasename>
USER-SUPPLIED VALUES:
controller:
  config:
    enable-brotli: "true"
    enable-opentelemetry: "true"
    otlp-collector-host: $HOST_IP
  extraEnvs:
  - name: HOST_IP
    valueFrom:
      fieldRef:
        fieldPath: status.hostIP
  opentelemetry:
    enabled: true
  service:
    externalTrafficPolicy: Local
    loadBalancerIP: <redacted>
  • If helm was not used, then copy/paste the complete precise command used to install the controller, along with the flags and options used

  • if you have more than one instance of the ingress-nginx-controller installed in the same cluster, please provide details for all the instances

  • Current State of the controller:

    • kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=nginx
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.7.0
              helm.sh/chart=ingress-nginx-4.6.0
Annotations:  meta.helm.sh/release-name: nginx
              meta.helm.sh/release-namespace: nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>
  • kubectl -n <ingresscontrollernamespace> get all -A -o wide

Why do you need all resources from all namespaces?

  • kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
Name:             nginx-ingress-nginx-controller-56675749b6-pwfn5
Namespace:        nginx
Priority:         0
Service Account:  nginx-ingress-nginx
Node:             aks-agentpool-27862475-vmss000001/10.240.0.115
Start Time:       Mon, 27 Mar 2023 16:29:11 +0300
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=nginx
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.7.0
                  helm.sh/chart=ingress-nginx-4.6.0
                  pod-template-hash=56675749b6
Annotations:      <none>
Status:           Running
IP:               10.240.0.138
IPs:
  IP:           10.240.0.138
Controlled By:  ReplicaSet/nginx-ingress-nginx-controller-56675749b6
Init Containers:
  opentelemetry:
    Container ID:  containerd://f8d399eb9b7dc97fcec3394ea4c7aea41b19e1346d41cb928cf2769cac153e34
    Image:         registry.k8s.io/ingress-nginx/opentelemetry:v20230312-helm-chart-4.5.2-28-g66a760794@sha256:40f766ac4a9832f36f217bb0e98d44c8d38faeccbfe861fbc1a76af7e9ab257f
    Image ID:      registry.k8s.io/ingress-nginx/opentelemetry@sha256:40f766ac4a9832f36f217bb0e98d44c8d38faeccbfe861fbc1a76af7e9ab257f
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      /usr/local/bin/init_module.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 27 Mar 2023 16:29:13 +0300
      Finished:     Mon, 27 Mar 2023 16:29:13 +0300
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /modules_mount from modules (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pnmmk (ro)
Containers:
  controller:
    Container ID:  containerd://9223ed6ebf327ec74ffd86ab6940b7981b67bb235ef5168daad5c056a276a85b
    Image:         registry.k8s.io/ingress-nginx/controller:v1.7.0@sha256:7612338342a1e7b8090bef78f2a04fffcadd548ccaabe8a47bf7758ff549a5f7
    Image ID:      registry.k8s.io/ingress-nginx/controller@sha256:7612338342a1e7b8090bef78f2a04fffcadd548ccaabe8a47bf7758ff549a5f7
    Ports:         80/TCP, 443/TCP, 8443/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/nginx-ingress-nginx-controller
      --election-id=nginx-ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/nginx-ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    State:          Running
      Started:      Mon, 27 Mar 2023 16:29:22 +0300
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       nginx-ingress-nginx-controller-56675749b6-pwfn5 (v1:metadata.name)
      POD_NAMESPACE:  nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
      HOST_IP:         (v1:status.hostIP)
    Mounts:
      /modules_mount from modules (rw)
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pnmmk (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  modules:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nginx-ingress-nginx-admission
    Optional:    false
  kube-api-access-pnmmk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From                      Message
  ----    ------     ----  ----                      -------
  Normal  Scheduled  18m   default-scheduler         Successfully assigned nginx/nginx-ingress-nginx-controller-56675749b6-pwfn5 to aks-agentpool-27862475-vmss000001
  Normal  Pulling    18m   kubelet                   Pulling image "registry.k8s.io/ingress-nginx/opentelemetry:v20230312-helm-chart-4.5.2-28-g66a760794@sha256:40f766ac4a9832f36f217bb0e98d44c8d38faeccbfe861fbc1a76af7e9ab257f"
  Normal  Pulled     17m   kubelet                   Successfully pulled image "registry.k8s.io/ingress-nginx/opentelemetry:v20230312-helm-chart-4.5.2-28-g66a760794@sha256:40f766ac4a9832f36f217bb0e98d44c8d38faeccbfe861fbc1a76af7e9ab257f" in 1.340993991s
  Normal  Created    17m   kubelet                   Created container opentelemetry
  Normal  Started    17m   kubelet                   Started container opentelemetry
  Normal  Pulling    17m   kubelet                   Pulling image "registry.k8s.io/ingress-nginx/controller:v1.7.0@sha256:7612338342a1e7b8090bef78f2a04fffcadd548ccaabe8a47bf7758ff549a5f7"
  Normal  Pulled     17m   kubelet                   Successfully pulled image "registry.k8s.io/ingress-nginx/controller:v1.7.0@sha256:7612338342a1e7b8090bef78f2a04fffcadd548ccaabe8a47bf7758ff549a5f7" in 6.859694032s
  Normal  Created    17m   kubelet                   Created container controller
  Normal  Started    17m   kubelet                   Started container controller
  Normal  RELOAD     17m   nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  • kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
Name:                     nginx-ingress-nginx-controller
Namespace:                nginx
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=nginx
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.7.0
                          helm.sh/chart=ingress-nginx-4.6.0
Annotations:              meta.helm.sh/release-name: nginx
                          meta.helm.sh/release-namespace: nginx
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.0.59.87
IPs:                      10.0.59.87
IP:                       20.50.181.64
LoadBalancer Ingress:     20.50.181.64
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  30879/TCP
Endpoints:                10.240.0.138:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  32505/TCP
Endpoints:                10.240.0.138:443
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     30068
Events:                   <none>
  • Current state of ingress object, if applicable:

    • kubectl -n <appnnamespace> get all,ing -o wide
    • kubectl -n <appnamespace> describe ing <ingressname>
    • If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag
  • Others:

    • Any other related information like ;
      • copy/paste of the snippet (if applicable)
      • kubectl describe ... of any custom configmap(s) created and in use
      • Any other related information that may help

How to reproduce this issue:

Anything else we need to know:

agronholm avatar Mar 27 '23 13:03 agronholm

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 27 '23 13:03 k8s-ci-robot

/remove-kind bug /kind support cc @esigo

longwuyuan avatar Mar 27 '23 13:03 longwuyuan

https://kubernetes.github.io/ingress-nginx/user-guide/third-party-addons/opentelemetry/

longwuyuan avatar Mar 27 '23 13:03 longwuyuan

Yes, I've read that.

agronholm avatar Mar 27 '23 13:03 agronholm

Yes, I've read that.

Thanks. Just waiting for comments from @esigo (in case that doc too needs updates)

longwuyuan avatar Mar 27 '23 13:03 longwuyuan

@longwuyuan please feel free to /assign OpenTelemetry related issues to me.

esigo avatar Mar 27 '23 19:03 esigo

/assign

esigo avatar Mar 27 '23 19:03 esigo

@agronholm could you please try with otel-sampler-ratio: "1.0", by default the rate is 0.01.

esigo avatar Mar 27 '23 20:03 esigo

Didn't help. Do I need to change the otel-sampler setting too? What does that even do?

agronholm avatar Mar 28 '23 10:03 agronholm

I had a similar issue, for me looks like it's fixed by adding otel-sampler: AlwaysOn.

Is the default really AlwaysOff??

https://github.com/kubernetes/ingress-nginx/blob/c84ae78bdfabb1f93bf0e87cdff34c75c744b8df/internal/ingress/controller/config/config.go#L598-L600

According to the OpenTelemetry spec, AlwaysOff means "Returns DROP always." and that sounds like the default is to trace nothing

danopia avatar Apr 01 '23 11:04 danopia

So not only do we have to explicitly enable opentelemetry, but we also need to explicitly turn on the sampler? Why?

agronholm avatar Apr 01 '23 18:04 agronholm

It seems there is still some lacking documentation around this which should be addressed. However, I was able to get this working when I was not able to before (see issue #9508 for details on my experience).

What I did to get this working was update my Helm override values yaml as follows:

## nginx configuration
## Ref: https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/index.md
##

controller:
  extraModules: []
  opentelemetry:
    enabled: true

  config:
    otlp-collector-host: obs-otel-collector.obs   # <--- set this to my otel collector host in my cluster
    enable-opentelemetry: "true"
    otel-sampler: AlwaysOn  # <--- set this to AlwaysOn
    otel-sampler-ratio: "1.0"  # <--- Set this to 1.0 as suggested

  admissionWebhooks:
    enabled: false

js8080 avatar Apr 19 '23 14:04 js8080

I stumbled upon this issue because I am also trying to send opentelemetry signals to the a host port. It looks like using an environment variable to specify the controller ip address is not supported?

Here is a relevant log from my deployment:

[Error] File: /tmp/build/opentelemetry-cpp-v1.8.1/exporters/otlp/src/otlp_grpc_exporter.cc:67[OTLP TRACE GRPC Exporter] Export() failed with status_code: "UNAVAILABLE" error_message: "DNS resolution failed for $OPENTELEMETRY_COLLECTOR_IP:4317: C-ares status is not ARES_SUCCESS qtype=A name=$OPENTELEMETRY_COLLECTOR_IP is_balancer=0: Domain name not found"

Here $OPENTELEMETRY_COLLECTOR_IP is mapped to status.hostIP similar to the issue description.

@agronholm can you confirm if you also receive these error logs from the controller?

@esigo Is this a pattern you want to support? I would imagine this is a common pattern since OpenTelemetry project recommends running one collector per node and so it feels natural to send signals to the host ip.

andrew-gropyus avatar May 10 '23 12:05 andrew-gropyus

@andrew-gropyus you could use the following env as an example: OTEL_EXPORTER_OTLP_ENDPOINT="http://jaeger-all:4317". This should already work

esigo avatar May 21 '23 15:05 esigo

Hey @esigo this works. Here I use kubernetes to do the interpolation, in the helm values.yaml:

controller:
  extraEnvs:
    - name = "OPENTELEMETRY_COLLECTOR_IP"
       valueFrom:
         fieldRef:
           fieldPath: "status.hostIP"
    - name: "OTEL_EXPORTER_OTLP_ENDPOINT"
      value: "http://$(OPENTELEMETRY_COLLECTOR_IP):4317"

andrew-gropyus avatar May 22 '23 07:05 andrew-gropyus

Hey @esigo this works. Here I use kubernetes to do the interpolation, in the helm values.yaml:

controller:
  extraEnvs:
    - name = "OPENTELEMETRY_COLLECTOR_IP"
       valueFrom:
         fieldRef:
           fieldPath: "status.hostIP"
    - name: "OTEL_EXPORTER_OTLP_ENDPOINT"
      value: "http://$(OPENTELEMETRY_COLLECTOR_IP):4317"

thanks a lot, but maybe you mean

controller:
  extraEnvs:
    - name: "OPENTELEMETRY_COLLECTOR_IP"
      valueFrom:
        fieldRef:
          fieldPath: "status.hostIP"
    - name: "OTEL_EXPORTER_OTLP_ENDPOINT"
      value: "http://$(OPENTELEMETRY_COLLECTOR_IP):4317"

EvertonSA avatar Feb 15 '24 21:02 EvertonSA