ingress-nginx
ingress-nginx copied to clipboard
Cannot see OpenTelemetry traces
What happened:
I upgraded to ingress-nginx 1.7.0 and Helm chart 4.6.0, and enabled OpenTelemetry configuration, to the best of my knowledge. The following represent the custom values I entered:
controller:
opentelemetry:
enabled: true
config:
otlp-collector-host: $HOST_IP
enable-opentelemetry: "true"
enable-brotli: "true"
extraEnvs:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
Firstly, without that first controller.opentelemetry.enabled: true option, it would fail to load the otel dynamic library. I didn't initially see this documented anywhere, but a kind soul on the OTEL feature ticket told me to add that.
What you expected to happen:
I expected to start seeing traces from the nginx service on my traces feed. I have both Datadog and SignOz hooked up to the OTEL collector (which runs as a daemonset), but I haven't seen any nginx traces anywhere yet. No errors either.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
nginx-ingress-nginx-controller-56675749b6-pwfn5:/etc/nginx$ nginx-ingress-controller --version
bash: nginx-ingress-controller: command not found
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.7", GitCommit:"723bcdb232300aaf5e147ff19b4df7ec8a20278d", GitTreeState:"archive", BuildDate:"2023-03-02T00:00:00Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"34f89fd3fb1a106e1b23d3454b2f2cbf305602a1", GitTreeState:"clean", BuildDate:"2023-02-10T14:48:59Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Environment:
-
Cloud provider or hardware configuration: Azure AKS
-
OS (e.g. from /etc/os-release):
-
Kernel (e.g.
uname -a): -
Install tools: AKS managed
Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
-
Basic cluster related info:
kubectl versionkubectl get nodes -o wide
-
How was the ingress-nginx-controller installed:
- If helm was used then please show output of
helm ls -A | grep -i ingress
- If helm was used then please show output of
nginx nginx 12 2023-03-27 16:29:03.278334939 +0300 EEST deployed ingress-nginx-4.6.0 1.7.0
- If helm was used then please show output of
helm -n <ingresscontrollernamepspace> get values <helmreleasename>
USER-SUPPLIED VALUES:
controller:
config:
enable-brotli: "true"
enable-opentelemetry: "true"
otlp-collector-host: $HOST_IP
extraEnvs:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
opentelemetry:
enabled: true
service:
externalTrafficPolicy: Local
loadBalancerIP: <redacted>
-
If helm was not used, then copy/paste the complete precise command used to install the controller, along with the flags and options used
-
if you have more than one instance of the ingress-nginx-controller installed in the same cluster, please provide details for all the instances
-
Current State of the controller:
kubectl describe ingressclasses
Name: nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=nginx
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.7.0
helm.sh/chart=ingress-nginx-4.6.0
Annotations: meta.helm.sh/release-name: nginx
meta.helm.sh/release-namespace: nginx
Controller: k8s.io/ingress-nginx
Events: <none>
kubectl -n <ingresscontrollernamespace> get all -A -o wide
Why do you need all resources from all namespaces?
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
Name: nginx-ingress-nginx-controller-56675749b6-pwfn5
Namespace: nginx
Priority: 0
Service Account: nginx-ingress-nginx
Node: aks-agentpool-27862475-vmss000001/10.240.0.115
Start Time: Mon, 27 Mar 2023 16:29:11 +0300
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=nginx
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.7.0
helm.sh/chart=ingress-nginx-4.6.0
pod-template-hash=56675749b6
Annotations: <none>
Status: Running
IP: 10.240.0.138
IPs:
IP: 10.240.0.138
Controlled By: ReplicaSet/nginx-ingress-nginx-controller-56675749b6
Init Containers:
opentelemetry:
Container ID: containerd://f8d399eb9b7dc97fcec3394ea4c7aea41b19e1346d41cb928cf2769cac153e34
Image: registry.k8s.io/ingress-nginx/opentelemetry:v20230312-helm-chart-4.5.2-28-g66a760794@sha256:40f766ac4a9832f36f217bb0e98d44c8d38faeccbfe861fbc1a76af7e9ab257f
Image ID: registry.k8s.io/ingress-nginx/opentelemetry@sha256:40f766ac4a9832f36f217bb0e98d44c8d38faeccbfe861fbc1a76af7e9ab257f
Port: <none>
Host Port: <none>
Command:
sh
-c
/usr/local/bin/init_module.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 27 Mar 2023 16:29:13 +0300
Finished: Mon, 27 Mar 2023 16:29:13 +0300
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/modules_mount from modules (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pnmmk (ro)
Containers:
controller:
Container ID: containerd://9223ed6ebf327ec74ffd86ab6940b7981b67bb235ef5168daad5c056a276a85b
Image: registry.k8s.io/ingress-nginx/controller:v1.7.0@sha256:7612338342a1e7b8090bef78f2a04fffcadd548ccaabe8a47bf7758ff549a5f7
Image ID: registry.k8s.io/ingress-nginx/controller@sha256:7612338342a1e7b8090bef78f2a04fffcadd548ccaabe8a47bf7758ff549a5f7
Ports: 80/TCP, 443/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--publish-service=$(POD_NAMESPACE)/nginx-ingress-nginx-controller
--election-id=nginx-ingress-nginx-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/nginx-ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
State: Running
Started: Mon, 27 Mar 2023 16:29:22 +0300
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-nginx-controller-56675749b6-pwfn5 (v1:metadata.name)
POD_NAMESPACE: nginx (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
HOST_IP: (v1:status.hostIP)
Mounts:
/modules_mount from modules (rw)
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pnmmk (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
modules:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-nginx-admission
Optional: false
kube-api-access-pnmmk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned nginx/nginx-ingress-nginx-controller-56675749b6-pwfn5 to aks-agentpool-27862475-vmss000001
Normal Pulling 18m kubelet Pulling image "registry.k8s.io/ingress-nginx/opentelemetry:v20230312-helm-chart-4.5.2-28-g66a760794@sha256:40f766ac4a9832f36f217bb0e98d44c8d38faeccbfe861fbc1a76af7e9ab257f"
Normal Pulled 17m kubelet Successfully pulled image "registry.k8s.io/ingress-nginx/opentelemetry:v20230312-helm-chart-4.5.2-28-g66a760794@sha256:40f766ac4a9832f36f217bb0e98d44c8d38faeccbfe861fbc1a76af7e9ab257f" in 1.340993991s
Normal Created 17m kubelet Created container opentelemetry
Normal Started 17m kubelet Started container opentelemetry
Normal Pulling 17m kubelet Pulling image "registry.k8s.io/ingress-nginx/controller:v1.7.0@sha256:7612338342a1e7b8090bef78f2a04fffcadd548ccaabe8a47bf7758ff549a5f7"
Normal Pulled 17m kubelet Successfully pulled image "registry.k8s.io/ingress-nginx/controller:v1.7.0@sha256:7612338342a1e7b8090bef78f2a04fffcadd548ccaabe8a47bf7758ff549a5f7" in 6.859694032s
Normal Created 17m kubelet Created container controller
Normal Started 17m kubelet Started container controller
Normal RELOAD 17m nginx-ingress-controller NGINX reload triggered due to a change in configuration
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
Name: nginx-ingress-nginx-controller
Namespace: nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=nginx
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.7.0
helm.sh/chart=ingress-nginx-4.6.0
Annotations: meta.helm.sh/release-name: nginx
meta.helm.sh/release-namespace: nginx
Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx,app.kubernetes.io/name=ingress-nginx
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.0.59.87
IPs: 10.0.59.87
IP: 20.50.181.64
LoadBalancer Ingress: 20.50.181.64
Port: http 80/TCP
TargetPort: http/TCP
NodePort: http 30879/TCP
Endpoints: 10.240.0.138:80
Port: https 443/TCP
TargetPort: https/TCP
NodePort: https 32505/TCP
Endpoints: 10.240.0.138:443
Session Affinity: None
External Traffic Policy: Local
HealthCheck NodePort: 30068
Events: <none>
-
Current state of ingress object, if applicable:
kubectl -n <appnnamespace> get all,ing -o widekubectl -n <appnamespace> describe ing <ingressname>- If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag
-
Others:
- Any other related information like ;
- copy/paste of the snippet (if applicable)
kubectl describe ...of any custom configmap(s) created and in use- Any other related information that may help
- Any other related information like ;
How to reproduce this issue:
Anything else we need to know:
This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-kind bug /kind support cc @esigo
https://kubernetes.github.io/ingress-nginx/user-guide/third-party-addons/opentelemetry/
Yes, I've read that.
Yes, I've read that.
Thanks. Just waiting for comments from @esigo (in case that doc too needs updates)
@longwuyuan please feel free to /assign OpenTelemetry related issues to me.
/assign
@agronholm could you please try with otel-sampler-ratio: "1.0", by default the rate is 0.01.
Didn't help. Do I need to change the otel-sampler setting too? What does that even do?
I had a similar issue, for me looks like it's fixed by adding otel-sampler: AlwaysOn.
Is the default really AlwaysOff??
https://github.com/kubernetes/ingress-nginx/blob/c84ae78bdfabb1f93bf0e87cdff34c75c744b8df/internal/ingress/controller/config/config.go#L598-L600
According to the OpenTelemetry spec, AlwaysOff means "Returns DROP always." and that sounds like the default is to trace nothing
So not only do we have to explicitly enable opentelemetry, but we also need to explicitly turn on the sampler? Why?
It seems there is still some lacking documentation around this which should be addressed. However, I was able to get this working when I was not able to before (see issue #9508 for details on my experience).
What I did to get this working was update my Helm override values yaml as follows:
## nginx configuration
## Ref: https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/index.md
##
controller:
extraModules: []
opentelemetry:
enabled: true
config:
otlp-collector-host: obs-otel-collector.obs # <--- set this to my otel collector host in my cluster
enable-opentelemetry: "true"
otel-sampler: AlwaysOn # <--- set this to AlwaysOn
otel-sampler-ratio: "1.0" # <--- Set this to 1.0 as suggested
admissionWebhooks:
enabled: false
I stumbled upon this issue because I am also trying to send opentelemetry signals to the a host port. It looks like using an environment variable to specify the controller ip address is not supported?
Here is a relevant log from my deployment:
[Error] File: /tmp/build/opentelemetry-cpp-v1.8.1/exporters/otlp/src/otlp_grpc_exporter.cc:67[OTLP TRACE GRPC Exporter] Export() failed with status_code: "UNAVAILABLE" error_message: "DNS resolution failed for $OPENTELEMETRY_COLLECTOR_IP:4317: C-ares status is not ARES_SUCCESS qtype=A name=$OPENTELEMETRY_COLLECTOR_IP is_balancer=0: Domain name not found"
Here $OPENTELEMETRY_COLLECTOR_IP is mapped to status.hostIP similar to the issue description.
@agronholm can you confirm if you also receive these error logs from the controller?
@esigo Is this a pattern you want to support? I would imagine this is a common pattern since OpenTelemetry project recommends running one collector per node and so it feels natural to send signals to the host ip.
@andrew-gropyus you could use the following env as an example: OTEL_EXPORTER_OTLP_ENDPOINT="http://jaeger-all:4317".
This should already work
Hey @esigo this works. Here I use kubernetes to do the interpolation, in the helm values.yaml:
controller:
extraEnvs:
- name = "OPENTELEMETRY_COLLECTOR_IP"
valueFrom:
fieldRef:
fieldPath: "status.hostIP"
- name: "OTEL_EXPORTER_OTLP_ENDPOINT"
value: "http://$(OPENTELEMETRY_COLLECTOR_IP):4317"
Hey @esigo this works. Here I use kubernetes to do the interpolation, in the helm values.yaml:
controller: extraEnvs: - name = "OPENTELEMETRY_COLLECTOR_IP" valueFrom: fieldRef: fieldPath: "status.hostIP" - name: "OTEL_EXPORTER_OTLP_ENDPOINT" value: "http://$(OPENTELEMETRY_COLLECTOR_IP):4317"
thanks a lot, but maybe you mean
controller:
extraEnvs:
- name: "OPENTELEMETRY_COLLECTOR_IP"
valueFrom:
fieldRef:
fieldPath: "status.hostIP"
- name: "OTEL_EXPORTER_OTLP_ENDPOINT"
value: "http://$(OPENTELEMETRY_COLLECTOR_IP):4317"