aws-app-mesh-roadmap
aws-app-mesh-roadmap copied to clipboard
Bug: Missing x-amzn-trace-id in response headers
Summary
Envoy doesn't add x-amzn-trace-id
header to response headers if it's missing. This way, proper serving on x-amzn-trace-id
header fully relies on whether the application container uses XRAY SDK or is otherwise configured to propagate x-amzn-trace-id
header from Request to Response.
Steps to Reproduce
- Deploy an ECS Service (let's call it
svc-a
) with vanilla Nginx (usenginx
asimage
incontainerDefinition
). Add Envoy and X-Ray Sidecar, and enable Envoy <=> X-Ray Integration withENABLE_ENVOY_XRAY_TRACING=1
. - Deploy another, identical ECS Service (let's call it
svc-b
), and point tosvc-a
in Virtual Node Backends Configuration, to letsvc-b
Envoy know there's an integration between the two. - Use ECS Exec to SSH to
svc-b
Nginx container - From isnide of
svc-b
Nginx container, make a HTTP Request tosvc-a
. Like this:curl -v http://svc-a.dev.local/foo1234
- The response does not contain
x-amzn-trace-id
HTTP Header$ curl -v http://svc-a.dev.local * Trying 10.130.119.69:80... * Connected to svc-a.dev.local (10.130.119.69) port 80 (#0) > GET /foo1234 HTTP/1.1 > Host: svc-a.dev.local > User-Agent: curl/7.76.1 > Accept: */* > * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < date: Wed, 02 Mar 2022 10:43:41 GMT < server: envoy < x-envoy-upstream-service-time: 4 < transfer-encoding: chunked
-
Observe, that the
x-amzn-trace-id
header is missing - Navigate to X-Ray Console in AWS Console, and check traces for
svc-a
. - Observe, that the Trace for the
/foo1234
request has been created.
Are you currently working around this issue? We use XRAY SDK to instrument our backends. It works well with Python / Flask applications, but somehow fails to work with .NET applications (we're investigating the issue, but that's how we found out). The Nginx is just an example, but it shows that things can get difficult when the backend application is a closed-source third party and there may be no ways to enforce propagating headers.
Additional context
Envoy version: v1.20.0.1-prod
Hi @mkielar thanks for raising this bug,
Some clarifying questions:
- Is xray integration enabled in svc-b's envoy sidecar ?
- Does the trace seen on X-Ray console say
Origin = AWS::AppMesh::Proxy
or is it coming from Xray SDK ? - Just making sure that the App Mesh Virtual Node listener protocol is not set as TCP. Ref: docs
Hi @suniltheta,
- Is xray integration enabled in svc-b's envoy sidecar ?
Yes, it is. As a matter of fact, I tested this with following configurations:
-
EC2 (no Envoy) => svc-b (Envoy + X-Ray) running Python App with X-Ray SDK
=> Gotx-amzn-trace-id
back -
svc-a (Envoy + X-Ray) => svc-b (Envoy + X-Ray) running Python App with X-Ray SDK
=> Gotx-amzn-trace-id
back -
EC2 (no Envoy) => svc-b (Envoy + X-Ray) running .NET App with misconfigured X-Ray SDK
=> Didn't getx-amzn-trace-id
back -
svc-a (Envoy + X-Ray) => svc-b (Envoy + X-Ray) running .NET App with misconfigured X-Ray SDK
=> Didn't getx-amzn-trace-id
back -
EC2 (no Envoy) => svc-b (Envoy + X-Ray) running pure Nginx, no X-Ray SDK
=> Didn't getx-amzn-trace-id
back -
svc-a (Envoy + X-Ray) => svc-b (Envoy + X-Ray) running pure Nginx, no X-Ray SDK
=> Didn't getx-amzn-trace-id
back
I then additionally tested some of the configs, replacing X-Ray Sidecar with AWS Distro for Open Telemetry Sidecar with configured X-Ray Receiver / Exporter. In all of the cases I got the header back only if the X-Ray SDK was properly configured and was adding it to the response from the App Container. If the X-Ray SDK was misconfigured (our .NET apps) or missing (pure Nginx container) then the header was missing in the response.
- Does the trace seen on X-Ray console say Origin = AWS::AppMesh::Proxy or is it coming from Xray SDK ?
It says AWS::AppMesh::Proxy
. I cannot post screenshots, because the setup I'm working on is not really svc-a / svc-b but our production system and I'm under NDA, but I can confirm that the visualizations on X-Ray UI show all elements of the call-chain correctly.
- Just making sure that the App Mesh Virtual Node listener protocol is not set as TCP. Ref: docs
It's not. These are all HTTP Services and we have build reusable Terraform modules to deploy our ECS Services, so they all (Python / .NET / Nginx) get exactly the same configuration of Virtual Nodes / Services / Routes.
OK, I think I can actually present a screenshot.
This one shows a trace for svc-b
(being an ECS Fargate Services running a pure Nginx, with Envoy Sidecar integrated with OpenTelemetry Sidecar running X-Ray Pipeline), being accessed by curl
from an EC2 that does not have Envoy installed.
Then, the next one is a trace for svc-b
(same config as above) being accessed by curl
executed from an Application Container running in ECS Fargate, after connecting to it with ECS Exec. The Application Container is part of an ECS Task which runs Envoy integrated with X-Ray Agent. The name of the initiating service had to be obfuscated because of the NDA, but otherwise it shows the expected set of components.
I was able to recreate the issue using Xray & Jaeger tracing. I believe the same behavior is common for other tracer as well.
Below I used Jaeger as trace collector for Zipkin format.
In below call the backend is made is include x-b3-*
headers from request to response. i.e., instrument the backend.
sunnrs@3c22fb1a7644 ~/projects/suniltheta/aws-app-mesh-examples/walkthroughs/howto-k8s-alb main ● curl -v k8s-howtok8s-color-63786f35e6-501468261.us-west-2.elb.amazonaws.com/color
* Trying 54.244.188.207...
* TCP_NODELAY set
* Connected to k8s-howtok8s-color-63786f35e6-501468261.us-west-2.elb.amazonaws.com (54.244.188.207) port 80 (#0)
> GET /color HTTP/1.1
> Host: k8s-howtok8s-color-63786f35e6-501468261.us-west-2.elb.amazonaws.com
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 04 Mar 2022 16:48:08 GMT
< Transfer-Encoding: chunked
< Connection: keep-alive
< server: envoy
< x-b3-traceid: 3ebe5fd2589f18da
< x-b3-spanid: 318f1d4b6dcb0364
< x-b3-parentspanid: 3ebe5fd2589f18da
< x-b3-sampled: 1
< x-b3-flags: None
< b3: None
< x-envoy-upstream-service-time: 0
<
* Connection #0 to host k8s-howtok8s-color-63786f35e6-501468261.us-west-2.elb.amazonaws.com left intact
None* Closing connection 0
Below call the backend is not made is include x-b3-*
headers from request to response. i.e., not instrumenting the backend.
sunnrs@3c22fb1a7644 ~/projects/suniltheta/aws-app-mesh-examples/walkthroughs/howto-k8s-alb main ● curl -v k8s-howtok8s-color-63786f35e6-501468261.us-west-2.elb.amazonaws.com/color1
* Trying 54.214.180.88...
* TCP_NODELAY set
* Connected to k8s-howtok8s-color-63786f35e6-501468261.us-west-2.elb.amazonaws.com (54.214.180.88) port 80 (#0)
> GET /color1 HTTP/1.1
> Host: k8s-howtok8s-color-63786f35e6-501468261.us-west-2.elb.amazonaws.com
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 04 Mar 2022 16:48:10 GMT
< Transfer-Encoding: chunked
< Connection: keep-alive
< server: envoy
< x-envoy-upstream-service-time: 0
<
* Connection #0 to host k8s-howtok8s-color-63786f35e6-501468261.us-west-2.elb.amazonaws.com left intact
None* Closing connection 0
I believe this is not a bug on App Mesh side or not even the xray extension.
If we refer the envoy code where the headers are inject, it is only on the request path. On the response path the headers are not injected again. If the application doesn’t instrument the tracing then the response will not contain the necessary headers. So it will be the onus of the application/SDK to pass the header from request to response.
https://github.com/envoyproxy/envoy/blob/main/source/extensions/tracers/xray/tracer.cc#L102 https://github.com/envoyproxy/envoy/blob/main/source/extensions/tracers/zipkin/zipkin_tracer_impl.cc#L42
Checking the envoy debug logs to see the headers x-amzn-trace-id
logged and correlating with the request definitely defeats the purpose of having the tracing enabled. But there is not much we can do :(
So, If I understand you correctly, it's the Envoy internal implementation that prevents any traceing plugin from enriching the response, correct? In that case do you suggest I'd rather report that as a Feature Request for https://github.com/envoyproxy/envoy?
It is true that this has to come from Envoy itself. This has to do with design decision of http tracers in envoy.
@suniltheta Thanks for explanation, and further research. I have reported this for Envoy to review, hopefully they'll find my argumentation convincing. That said, I'm not sure what to do with this ticket. Should we leave it open for you to implement any require improvements to x-ray tracer extension once Envoy introduces API that allows for response-header enrichment?
Hi @mkielar thanks for opening the discussion on the envoy github.
We can decide the outcome of this issue based on what the community decides, depending on whether we move ahead with including the trace headers in the response or not. Meanwhile I will mark this issue as blocked on envoy fix
.