opentelemetry-cpp-contrib
opentelemetry-cpp-contrib copied to clipboard
Nginx auth_request does not share trace id of parent request
Describe your environment
I am running in Docker and copying in the OpenTelemetry module from https://github.com/nginxinc/nginx-unsupported-modules, specifically this image, sha256:43f2fbf6bd3ca9186e4712e9ab79d36abfa2a5d32624478cf02f46f657208272. This module is built directly from main and was created at 2022-04-12T02:50:18.92379913Z.
I am in the process of converting to OpenTracing to OpenTelemetry for Nginx. Most of this went ok, but it appears that sub requests from auth_request has their own trace_id. I would expect that the trace_id would be the same for the parent and sub request with just different span ids. Below are the files I am using to reproduce the issue.
Steps to reproduce
Dockerfile:
FROM nginx:1.21.6
RUN rm /etc/nginx/conf.d/default.conf
# Built from the package at https://github.com/nginxinc/nginx-unsupported-modules
COPY --from=ghcr.io/nginxinc/amd64/ngx_otel_module:linux-libc-nginx-1.21.6 /usr/lib/nginx/modules/otel_ngx_module.so /usr/lib/nginx/modules/otel_ngx_module.so
ADD localdev.conf /etc/nginx
ADD otel-config.toml /etc/nginx
CMD ["nginx", "-c", "/etc/nginx/localdev.conf", "-g", "daemon off;"]
localdev.conf:
# Available environment variables from https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md
env OTEL_EXPORTER_OTLP_ENDPOINT;
env OTEL_RESOURCE_ATTRIBUTES;
load_module /usr/lib/nginx/modules/otel_ngx_module.so;
worker_rlimit_nofile 20000; # max usable file descriptors, should be about double the worker_connections and less than the OS `nofile` ulimit
events {
worker_connections 10000;
}
http {
include /etc/nginx/mime.types;
# Configuration options at https://github.com/open-telemetry/opentelemetry-cpp-contrib/tree/main/instrumentation/nginx
# Other examples at https://github.com/nginxinc/bank-of-sirius/tree/master/src/nginx
opentelemetry_config /etc/nginx/otel-config.toml;
opentelemetry on;
opentelemetry_operation_name "$request_method $uri";
opentelemetry_trust_incoming_spans off; # Nothing upstream should give us a trace/span
opentelemetry_attribute http.user_agent $http_user_agent;
opentelemetry_attribute http.referer $http_referer;
opentelemetry_attribute net.peer.ip $remote_addr;
opentelemetry_propagate w3c;
# Logs all auth_request directives
log_subrequest on;
# Prior art for ms precision from https://thatsamguy.com/nginx-iso8601-time-format/
map $time_iso8601 $time_iso8601_timestamp {
~([^+]+) $1;
}
map $time_iso8601 $time_iso8601_timezone {
~\+([0-9:]+)$ $1;
}
map $msec $millisec {
~\.([0-9]+)$ $1;
}
log_format json_combined escape=json '{ "time_local": "$time_local", '
'"timestamp": "$time_iso8601_timestamp.$millisec+$time_iso8601_timezone", '
'"remote_addr": "$remote_addr", '
'"uri": "$uri", '
'"request": "$request", '
'"status": "$status", '
'"message": "$remote_addr - $request - $status - $request_time - $body_bytes_sent - $http_user_agent", '
'"forwarded_for": "$http_x_forwarded_for", '
'"body_bytes_sent": "$body_bytes_sent", '
'"request_time": "$request_time", '
'"http_referrer": "$http_referer", '
'"http_user_agent": "$http_user_agent", '
'"trace_id": "$opentelemetry_trace_id", '
'"span_id": "$opentelemetry_span_id" }';
error_log stderr info;
access_log /dev/stdout json_combined;
add_header Server-Timing "traceparent;desc=\"$opentelemetry_context_traceparent\"";
server {
set $backend_endpoint http://host.docker.internal:8000;
set $auth_endpoint http://host.docker.internal:8005;
set $domain_name *;
listen 80;
location /public-gateway-internal-routes/auth/ {
internal;
proxy_set_header traceparent $opentelemetry_context_traceparent;
proxy_redirect off;
proxy_pass $auth_endpoint/auth/;
}
location / {
auth_request /public-gateway-internal-routes/auth/;
proxy_set_header traceparent $opentelemetry_context_traceparent;
proxy_redirect off;
proxy_pass $backend_endpoint;
}
}
}
otel-config.toml:
exporter = "otlp"
processor = "batch"
[processors.batch]
max_queue_size = 2048
schedule_delay_millis = 5000
max_export_batch_size = 512
[service]
name = "gateway" # Opentelemetry resource name
[exporters.otlp]
# Alternatively the OTEL_EXPORTER_OTLP_ENDPOINT environment variable can also be used.
host = "localhost"
port = 4317
Copy these files into a folder and then:
docker build . -t nginxbug && docker run -p 8080:80 nginxbug
Once the server is up, you can curl localhost:8080.
These are examples of the 2 access logs that are emitted:
{ "time_local": "12/Apr/2022:06:08:26 +0000", "timestamp": "2022-04-12T06:08:26.336+00:00", "remote_addr": "172.17.0.1", "uri": "/public-gateway-internal-routes/auth/", "request": "GET / HTTP/1.1", "status": "502", "message": "172.17.0.1 - GET / HTTP/1.1 - 502 - 0.000 - 0 - curl/7.79.1", "forwarded_for": "", "body_bytes_sent": "0", "request_time": "0.000", "http_referrer": "", "http_user_agent": "curl/7.79.1", "trace_id": "7a0b6b7d06ea20df87f2bf13b184d949", "span_id": "d74eaee479f75458" }
{ "time_local": "12/Apr/2022:06:08:26 +0000", "timestamp": "2022-04-12T06:08:26.336+00:00", "remote_addr": "172.17.0.1", "uri": "/", "request": "GET / HTTP/1.1", "status": "500", "message": "172.17.0.1 - GET / HTTP/1.1 - 500 - 0.000 - 177 - curl/7.79.1", "forwarded_for": "", "body_bytes_sent": "177", "request_time": "0.000", "http_referrer": "", "http_user_agent": "curl/7.79.1", "trace_id": "25a831094334f5e595ce0da24dc1c41f", "span_id": "9db43885b76e7722" }
/public-gateway-internal-routes/auth/trace id - 7a0b6b7d06ea20df87f2bf13b184d949/trace id - 25a831094334f5e595ce0da24dc1c41f
What is the expected behavior?
I expected both of these to share the same trace id but with different span ids. This seems like the intended behavior from #94.
What is the actual behavior?
Trace ids are different between the parent and sub request.
Additional context
N/A
@phillipuniverse I came across the same problem and the problem seems to be that the auth_request is evaluated before the opentelemetry_propagate.
You can try to run the following example:
- configure the otel to be parent_based
- configure a location on nginx that sets the
opentelemetry_propagatethen proxy passes the request to your/location.
You should see on your logs 3 requests with the same trace id.
@jmadureira this is probably what's going on, I will let you know when I have a chance to test this.
What you're saying is to move the opentelemetry_propagate inside the routes like:
location /public-gateway-internal-routes/auth/ {
internal;
opentelemetry_propagate w3c;
proxy_set_header traceparent $opentelemetry_context_traceparent;
proxy_redirect off;
proxy_pass $auth_endpoint/auth/;
}
location / {
opentelemetry_propagate w3c;
auth_request /public-gateway-internal-routes/auth/;
proxy_set_header traceparent $opentelemetry_context_traceparent;
proxy_redirect off;
proxy_pass $backend_endpoint;
}
@jmadureira I still can't quite get this to work event with moving in the opentelemetry_propagate inside of a location before the auth_request.
What exactly did you mean by this?
configure the otel to be parent_based
It seems like no matter what, there are different trace_ids output to the logs for the subrequest (from auth_request) and the main request.
@DebajitDas thanks for the clarification in #174 about what that is designed to fix. Is the idea that eventually the nginx plugin goes away in favor of the otel-webserver-module and thus the issue I outlined here becomes moot? Or is the long-term strategy to continue to invest in the main nginx instrumentation?
I'm trying to figure out if I should roll up my sleeves and try to figure out what's going on with these sub requests in the main nginx library or if there is just something obvious I'm missing in my configuration.
In APIcast we are running into the same issue. Internal requests are not added as new spans for the same trace. Adding Otel instrumentation took place with this PR #1379 leveraging the nginx instrumentation instead of otel-webserver-module.
Any progress on this issue?
@eguzki would this help? https://github.com/kubernetes/ingress-nginx/issues/9811#issuecomment-1555976520
It helps to understand and validate our current config, but unfortunately does not work as expected. The gateway calls the "auth server" in a post action
post_action @out_of_band_authrep_action;
....
location @out_of_band_authrep_action {
....
And calling the upstream app is done using openresty's ngx.exec.
Interesting to note that the "auth" call replaces parent span and upstream span is never created.
Still looking at it.