llama-stack
llama-stack copied to clipboard
Propagate trace context in outgoing requests
🚀 Describe the new functionality needed
When tracing is enabled and llama-stack makes a request to an external service, it should propagate the trace header. (OpenTelemetry docs - Context Propagation)
The header entry will look like the following:
traceparent: 00-cd7088c08c5a37ba3fc0e27248981a71-0d36bf7af9548756-01
But when llama-stack makes a request to e.g. vllm, the outgoing request is missing this:
POST /v1/chat/completions HTTP/1.1
Host: localhost:8000
Accept: application/json
Accept-Encoding: gzip, deflate
Authorization: Bearer fake
Content-Length: 143
Content-Type: application/json
User-Agent: AsyncOpenAI/Python 1.76.1
X-Stainless-Arch: x64
X-Stainless-Async: async:asyncio
X-Stainless-Lang: python
X-Stainless-Os: Linux
X-Stainless-Package-Version: 1.76.1
X-Stainless-Read-Timeout: 600
X-Stainless-Retry-Count: 0
X-Stainless-Runtime: CPython
X-Stainless-Runtime-Version: 3.10.17
{"messages":[{"role":"user","content":[{"type":"text","text":"Berlin is"}]}],"model":"vllm","max_tokens":2048,"stream":false,"temperature":0.0}
💡 Why is this needed? What if we don't build it?
When the trace information are not further propagated it leads to a lack of e2e visibility.
Similar to accepting trace information from incoming requests (https://github.com/meta-llama/llama-stack/issues/2097).
Other thoughts
More details: https://github.com/meta-llama/llama-stack/issues/2097#issuecomment-2876396321