llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Propagate trace context in outgoing requests

Open frzifus opened this issue 5 months ago • 0 comments

🚀 Describe the new functionality needed

When tracing is enabled and llama-stack makes a request to an external service, it should propagate the trace header. (OpenTelemetry docs - Context Propagation)

The header entry will look like the following:

traceparent: 00-cd7088c08c5a37ba3fc0e27248981a71-0d36bf7af9548756-01

But when llama-stack makes a request to e.g. vllm, the outgoing request is missing this:

POST /v1/chat/completions HTTP/1.1
Host: localhost:8000
Accept: application/json
Accept-Encoding: gzip, deflate
Authorization: Bearer fake
Content-Length: 143
Content-Type: application/json
User-Agent: AsyncOpenAI/Python 1.76.1
X-Stainless-Arch: x64
X-Stainless-Async: async:asyncio
X-Stainless-Lang: python
X-Stainless-Os: Linux
X-Stainless-Package-Version: 1.76.1
X-Stainless-Read-Timeout: 600
X-Stainless-Retry-Count: 0
X-Stainless-Runtime: CPython
X-Stainless-Runtime-Version: 3.10.17

{"messages":[{"role":"user","content":[{"type":"text","text":"Berlin is"}]}],"model":"vllm","max_tokens":2048,"stream":false,"temperature":0.0}

💡 Why is this needed? What if we don't build it?

When the trace information are not further propagated it leads to a lack of e2e visibility.

Similar to accepting trace information from incoming requests (https://github.com/meta-llama/llama-stack/issues/2097).

Other thoughts

More details: https://github.com/meta-llama/llama-stack/issues/2097#issuecomment-2876396321

frzifus avatar May 13 '25 13:05 frzifus