Summary

This improves the OpenTelemetry implementation such that it correctly creates distributed traces.

This focuses on trace propagation, both to LLM providers (HTTP headers) and MCP servers (request metadata).

This also prefers standard otel environment configuration and defaults instead of hard-coding, allowing drop-in usage with the same ENV as python or other language agents.

Type of Change

[x] Feature
[ ] Bug fix
[x] Refactor / Code quality
[ ] Performance improvement
[ ] Documentation
[x] Tests
[ ] Security fix
[ ] Build / Release
[ ] Other (specify below)

Testing

Start ollama

OLLAMA_CONTEXT_LENGTH=131072 OLLAMA_HOST=0.0.0.0 ollama serve

Start otel-tui

docker run --rm -it --name otel-tui -p 4318:4318 ymtdzzz/otel-tui:latest

Start envoy ai gateway to proxy ollama and kiwi

$ docker run --rm  --name aigw -p 1975:1975 \
-e OPENAI_BASE_URL=http://host.docker.internal:11434/v1 -e OPENAI_API_KEY=unused \
-e OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4318 -e OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf \
-e OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=delta \
-e OTEL_METRIC_EXPORT_INTERVAL=100 -e OTEL_BSP_SCHEDULE_DELAY=100 \
 envoyproxy/ai-gateway-cli run --mcp-json '{"mcpServers":{"kiwi":{"type":"http","url":"https://mcp.kiwi.com"}}}'

Save this as kiwi_recipe.yaml

version: 1.0.0 title: Flight Search Assistant description: | This recipe helps users find flights from New York to Los Angeles using the MCP gateway. It searches for flights on a specified date and returns the top 3 cheapest options in a structured JSON format. instructions: | You are a helpful assistant that helps users find flights from New York to Los Angeles.

prompt: | Use the search-flight tool to search for flights from New York to Los Angeles on {{flight_date}}. When specifying the departureDate parameter, use the format dd/mm/yyyy. Display the first 3 results with the final_output tool in a specific JSON format. The output is in the following format where price includes the currency symbol:

{
  "contents": [
    {
      "airline": "...",
      "flight_number": "...",
      "price": "..."
    },
    ...
  ]
}

ANY FLIGHT IS FINE. DO NOT CONSIDER ANY USER PREFERENCES.*

extensions:

name: mcp_gateway type: streamable_http uri: http://127.0.0.1:1975/mcp

parameters:

key: flight_date input_type: string requirement: required description: flight date in DD/MM/YYYY format default: "31/12/2025"

settings:

Remove this if using GPT-5 series (e.g., gpt-5, gpt-5-mini, gpt-5-nano)

temperature: 0.0

response: json_schema: type: object additionalProperties: false required: - contents properties: contents: type: array description: "First 3 flight results in a consistent, parseable structure." minItems: 1 maxItems: 3 items: type: object additionalProperties: false required: - airline - flight_number - price properties: airline: type: string description: "Airline name" flight_number: type: string description: "Flight number" price: type: string description: "Price with currency symbol"

Run goose pointing to above!

OPENAI_HOST=http://127.0.0.1:1975 OPENAI_API_KEY=test-key \
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf \
OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=delta \
OTEL_METRIC_EXPORT_INTERVAL=100  OTEL_BSP_SCHEDULE_DELAY=100 \
cargo run --bin goose -- run --provider openai --model qwen3:1.7b --recipe kiwi_recipe.yaml --params flight_date=31/12/2025

Related Issues

#75

Screenshots/Demos (for UX changes)

Oct 13 '25 21:10 codefromthecrypt

On the first pass this seems as a nice touch up. Thank you! I will need to run this through with our internal OTEL client as some changes might cause metric drop ( particularly provider flushing) Will post an official review after that

Oct 14 '25 16:10 Kvadratni

@Kvadratni thanks and feedback welcome. I'm quite new at rust so any suggestions are welcome!

Oct 14 '25 21:10 codefromthecrypt

putting this back in draft because the integration test isn't working. I will be out next week so bear this in mind. happy to get feedback meanwhile if you like, or you can wait until I pull it out of draft again

Oct 15 '25 18:10 codefromthecrypt

Thank you so much for working on this. Just wanted to make sure - have you linked to the proper related issue? Just so it closes or updates what is related <3

Oct 16 '25 19:10 taniandjerry

@taniandjerry thanks for the feedback, so this topic is fairly ancient back to the python exchange days and most recently #203 and the polishing is a follow up to my old PR in python #75. Would it help for me to also make an issue which captures the motivation parts of the PR description then link this back to it?

Oct 28 '25 07:10 codefromthecrypt

This PR is unrelated to hacktoberfest. @taniandjerry

No worries on issue number @codefromthecrypt !!

Oct 28 '25 08:10 blackgirlbytes

cool will retouch this up once https://github.com/block/goose/pull/5165 merges so I make sure I get my code fashion together ;)

Oct 30 '25 03:10 codefromthecrypt

@codefromthecrypt I'd love to have a completed version of this one. Do you have bandwidth to update it?

Nov 03 '25 21:11 alexhancock

sorry yeah.. I got distracted on a side-quest about the claude provider and MCP based approval integration with goose. lemme refocus on this now

Nov 04 '25 02:11 codefromthecrypt

yesterday escaped me.. doing it now

Nov 04 '25 22:11 codefromthecrypt

update: I'm working hard on this, I need to stop for the day. I don't want to raise this again without 100pct verified integration test approach for validating propagtion. This is somewhat new ground as the only thing similar is the MCP replay tests. I prefer to delay another day to do this right than not, so hopefully understandable.

Nov 05 '25 09:11 codefromthecrypt

@codefromthecrypt of course - let us know whenever ready for a review! It's a big feature and will be huge to get it right

Nov 05 '25 23:11 alexhancock

I'm sorry, but I'm unable to complete this PR and have to step back from it. The code affected by instrumentation has diverged from main since I started, and my initial understanding was already limited, making it hard to catch up.

I've struggled to implement robust tests that would prevent future drift. My limited Rust experience is a big factor here—this section involves intricate instrumentation across multiple components, without sufficient end-to-end checks to ensure spans are properly initiated, terminated, and parented. While I've built similar features in Java, Go, and Python (where CLI testing feels more flexible), the setup in this project has proven challenging for me. I've also tried using agents to assist, but they haven't been effective for this level of complexity.

Ultimately, this has consumed more time than anticipated, leading to frustration and delaying other priorities. With upcoming travel, I don't foresee gaining the necessary proficiency in the next few weeks. I believe this would be better handled by someone more familiar with the project's quirks, stronger in Rust, and able to introduce a new fixture for comprehensive end-to-end side-effect testing—such as running the Goose CLI and/or Playwright with OTEL environment variables against mocked OpenAI and MCP services, then verifying both traces and propagated data. I'd be happy to sketch more what that would look like, but I have to call bankruptcy on salvaging this PR.

Nov 06 '25 06:11 codefromthecrypt

PS I had an LLM reverse engineer a guide from my local changes. hope it helps the next person (and doens't have a hallucination I missed!)

Async + OpenTelemetry

Libraries in Use

Async Runtime

tokio - Async runtime
async_stream - Macro for creating streams (try_stream!)
futures - Stream traits and combinators

Tracing/OpenTelemetry

tracing - Structured logging and tracing
tracing-opentelemetry - Bridge between tracing and OpenTelemetry
opentelemetry - Core OpenTelemetry API
opentelemetry-sdk - OpenTelemetry SDK implementation

How OpenTelemetry Span Context Works

Thread-Local Storage Mechanism

OpenTelemetry in Rust uses thread-local storage via tracing's subscriber system to track the "current span":

// tracing internally uses thread_local!
thread_local! {
    static CURRENT_SPAN: RefCell<Option<SpanContext>>;
}

The Critical Problem with Async

Thread-local storage does NOT propagate across .await points.

When an async task yields at .await:

The future may resume on a different thread
Thread-local span context is lost
Child operations have no parent span

The Bridge: tracing-opentelemetry

The tracing-opentelemetry crate bridges Rust's tracing ecosystem with OpenTelemetry:

tracing (Rust ecosystem)
  ↓ tracing-subscriber Layer system
tracing-opentelemetry (bridge crate)
  ├─ Implements tracing::Layer
  ├─ Stores OpenTelemetry context in tracing span
  └─ Provides extension traits to access it
     ↓
OpenTelemetrySpanExt::context() - extract OTel context from tracing span
TraceContextExt::span() - extract OTel span from context
  ↓
opentelemetry-sdk (OTel implementation)
SpanContext { trace_id, span_id, trace_flags }

Key insight: There are two different span types:

tracing::Span - Rust's structured logging/tracing
opentelemetry::trace::Span - OpenTelemetry's distributed tracing

They're connected via the tracing-opentelemetry bridge layer.

Alternative: Task-Local Storage

tokio::task_local! does propagate across .await points within the same task:

task_local! {
    pub static SESSION_ID: Option<String>;
}

Used for session_id propagation (see commit 0515c76e84806e0d328e62fa1f5b5ff90475908d).

Correct Instrumentation Patterns

Pattern 1: Regular Async Functions with #[instrument]

Use #[instrument] macro for regular async functions:

#[instrument(name = "operation", skip(param1, param2), fields(session.id = %session_id))]
async fn operation(param1: String, param2: &Data, session_id: &str) -> Result<Output> {
    // Span is active for entire function execution
    // Propagates correctly across all .await points
    let result = child_operation().await?;
    Ok(result)
}

How it works:

#[instrument] uses tracing::Instrument trait
Creates span when function is called
Span remains active across all .await points
Ends when function completes

When to use:

Standard async functions where the function name matches desired span name
When you need simple span attributes via the fields() parameter
Most common case for instrumentation

Pattern 2: Manual Span Creation

Create spans manually when you need fine-grained control:

fn create_chat_span(model_name: &str) -> tracing::Span {
    tracing::info_span!(
        "chat",  // Span name for tracing
        otel.name = format!("chat {}", model_name),  // OTel display name
        gen_ai.request.model = %model_name,
        gen_ai.system = "openai",
        gen_ai.operation.name = "chat"
    )
}

async fn complete(...) -> Result<Message> {
    let span = create_chat_span(&model_name);
    let _guard = span.enter();

    // All work happens under this span
    let response = self.client.post(url).send().await?;
    Ok(response)
}

When to use instead of #[instrument]:

Need custom OpenTelemetry attributes (like otel.name)
Following OpenTelemetry semantic conventions (e.g., GenAI)
Want different span name than function name
Need to construct span attributes dynamically

Why manual creation:

#[instrument] doesn't support the otel.name attribute
GenAI semantic conventions require specific span naming: "{operation} {model}"
Need precise control over span attributes beyond what #[instrument] provides

Pattern 3: Spawned Tasks

Use .in_current_span() or .instrument(span) for tokio::spawn:

// Variant 1: Capture current span
let handle = tokio::spawn(
    async move {
        // work happens here with current span as parent
    }.in_current_span()
);

// Variant 2: Propagate specific span
let span = tracing::Span::current();
let handle = tokio::spawn(
    async move {
        // work happens here with captured span as parent
    }.instrument(span)
);

How it works:

.in_current_span() captures the current span at call time
.instrument(span) uses a specific span you provide
The captured span becomes active in the spawned task
All operations in spawned task nest under captured span

When to use each:

.in_current_span(): When you want the current span
.instrument(span): When you need a specific span (more flexible)

Pattern 4: Lazy Streams (async_stream::try_stream!)

Capture span before stream creation, enter when stream executes:

async fn create_stream(...) -> Result<BoxStream<'_, Result<Item>>> {
    // 1. Capture current span before creating lazy stream
    let span = tracing::Span::current();

    Ok(Box::pin(async_stream::try_stream! {
        // 2. Enter the span when stream executes
        // IMPORTANT: Use underscore-prefixed name to keep guard alive
        let _guard = span.enter();

        // 3. All operations now have correct parent
        loop {
            let item = do_work().await?;
            yield item;
        }
        // Guard drops here, exiting span
    }))
}

CRITICAL: Span Guard Lifetime

// WRONG - guard drops immediately, span not active
let _ = span.enter();

// CORRECT - guard lives until end of scope
let _guard = span.enter();
let _span_guard = span.enter();  // Also correct

The underscore prefix (_guard) suppresses unused variable warnings while keeping the guard alive for the entire scope.

Why this works:

async_stream::try_stream! creates a lazy stream
The stream is not executing when the function returns
#[instrument] on the function would end when it returns
Must capture span while it's active, enter it when stream executes
The guard must stay alive to keep span active

Why NOT use .in_current_span():

.in_current_span() is for Future types
async_stream::try_stream! is a macro, not a future
Cannot apply .in_current_span() to macro expansion

OpenTelemetry Span Hierarchy

Span Context Components

Each span has:

trace_id (128-bit, 32 hex chars) - identifies the entire distributed trace
span_id (64-bit, 16 hex chars) - identifies this specific span
parent_span_id (64-bit, 16 hex chars) - identifies parent span
trace_flags (8-bit, 2 hex chars) - sampling and other flags

Parent-Child Relationships

When creating a span, OpenTelemetry:

Reads "current span" from thread-local storage
Uses its span_id as this span's parent_span_id
Inherits the trace_id from parent
Generates a new unique span_id for this span

Validation Rules

A span context is valid only if:

trace_id != 0x00000000000000000000000000000000 (not all zeros)
span_id != 0x0000000000000000 (not all zeros)

Invalid span contexts:

Are not propagated across process boundaries
Result in silent skipping of trace context injection
Cause remote processes to start new independent traces

Propagation Methods

Within a process: Via thread-local storage (with instrumentation patterns)
Across processes (HTTP): Via traceparent HTTP header (W3C Trace Context)
Across processes (MCP): Via _meta.traceparent field in JSON-RPC requests

Cross-Process Trace Context Propagation

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│ Goose Process (Client)                                       │
│                                                              │
│  ┌──────────────┐                                           │
│  │ Active Span  │ (thread-local)                            │
│  │ trace_id: T  │                                           │
│  │ span_id: S   │                                           │
│  └──────┬───────┘                                           │
│         │                                                    │
│         │ Extract & Serialize                               │
│         ↓                                                    │
│  ┌────────────────────────────────┐                        │
│  │ W3C Traceparent String         │                        │
│  │ "00-T-S-01"                    │                        │
│  └────┬───────────────────┬───────┘                        │
│       │                   │                                 │
│       │                   │                                 │
└───────┼───────────────────┼─────────────────────────────────┘
        │                   │
        │ HTTP              │ MCP
        │ (Header)          │ (Meta Field)
        ↓                   ↓
┌──────────────┐    ┌──────────────┐
│ LLM Provider │    │ MCP Server   │
│ (Remote)     │    │ (Remote)     │
└──────────────┘    └──────────────┘

W3C Trace Context Format

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             │  │                                │                │
             │  │                                │                └─ trace-flags (sampling, etc)
             │  │                                └──────────────── parent-id (this span's ID)
             │  └──────────────────────────────────────────────── trace-id (distributed trace identifier)
             └──────────────────────────────────────────────────── version (always 00)

Component Meanings:

version: Protocol version (currently always "00")
trace-id: 128-bit globally unique trace identifier
- Same across all processes in the trace
- Identifies the entire distributed transaction
- Format: 32 hexadecimal characters
parent-id: 64-bit span identifier
- The span ID from the sending process
- Becomes the parent_span_id in receiving process
- Format: 16 hexadecimal characters
trace-flags: 8-bit flags
- Bit 0: sampled flag (01 = sampled, 00 = not sampled)
- Other bits reserved for future use
- Format: 2 hexadecimal characters

State Transformation Pipeline

The span context goes through several transformations when crossing process boundaries:

In-Memory Representation:
  SpanContext {
    trace_id: u128,      // 128-bit integer
    span_id: u64,        // 64-bit integer
    trace_flags: u8      // 8-bit flags
  }
         ↓
Extraction (via OpenTelemetry APIs):
  tracing::Span::current()           // Get tracing span
    .context()                         // Get OTel context (via OpenTelemetrySpanExt)
    .span()                            // Get OTel span (via TraceContextExt)
    .span_context()                    // Get SpanContext
    .is_valid()                        // Validate non-zero
         ↓
Serialization:
  format!("00-{:032x}-{:016x}-{:02x}",
    trace_id, span_id, flags.to_u8())
         ↓
Transport Encoding:
  HTTP: traceparent: {string}
  MCP:  _meta: { "traceparent": "{string}" }
         ↓
Wire Format:
  HTTP: Plain header in HTTP/1.1 or HTTP/2
  MCP:  JSON object field in JSON-RPC

HTTP Propagation

Injection Point: The trace context is injected during HTTP request building, after the request is constructed but before it's sent.

ApiRequestBuilder::build()
  ↓
1. Build base HTTP request
2. Add authentication headers
3. Add session ID header (if present)
4. [TRACE PROPAGATION]
   a. Extract current span context
   b. Validate span context is_valid()
   c. Format as W3C traceparent
   d. Inject as "traceparent" header
5. Send request

Example HTTP Request:

POST /v1/chat/completions HTTP/1.1
Host: api.openai.com
Authorization: Bearer sk-...
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
Content-Type: application/json

{ "model": "gpt-4o", "messages": [...] }

When propagation is skipped:

No active span (span context is invalid)
Span context has all-zero trace_id or span_id
Tracing is not initialized or disabled
Behavior: Request proceeds without trace context (silent skip)

MCP Propagation

The Extensions Mechanism: MCP uses a type-erased Extensions map that gets serialized as _meta in JSON-RPC.

MCP Request Flow:
  ClientRequest::CallToolRequest
    ↓
  Extensions (type-erased map)
    ↓
  inject_session_into_extensions()
    ├─ Extract existing Meta object (if any)
    ├─ Create new map or use existing
    ├─ Add "session_id" field (if available)
    └─ Call inject_trace_context()
       ├─ Get current span
       ├─ Extract span_context
       ├─ Validate is_valid()
       ├─ Format as W3C traceparent
       └─ Add "traceparent" field to Meta map
    ↓
  Merged Meta inserted back into Extensions
    ↓
  JSON-RPC serialization adds _meta object
    ↓
  Wire format (JSON)

Example MCP Request:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "read_file",
    "arguments": { "path": "/tmp/file.txt" }
  },
  "_meta": {
    "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01",
    "session_id": "20251106_42"
  }
}

Key insights:

The Meta type wraps a HashMap<String, Value>
It's part of MCP's Extensions system
Gets serialized as _meta in JSON-RPC
Multiple metadata can coexist (trace context + session ID + others)
Propagation merges with existing metadata rather than replacing

Remote Parent Relationships

When trace context crosses a process boundary, the span hierarchy is maintained through ID linkage:

Process A                    Wire                   Process B
┌─────────────┐                                   ┌─────────────┐
│ Span Alpha  │                                   │             │
│ trace: T    │                                   │             │
│ span: AAAA  │──────────────────────────────────→│             │
│             │  traceparent: 00-T-AAAA-01        │ Span Beta   │
└─────────────┘                                   │ trace: T    │
                                                  │ parent: AAAA│
   Parent span in Process A                       │ span: BBBB  │
                                                  └─────────────┘
                                                    Child span in Process B

Critical concept:

Process A's span_id (AAAA) becomes Process B's parent_span_id
Process B generates a new span_id (BBBB) for its own span
The trace_id (T) is identical in both processes
This creates a distributed span hierarchy across process boundaries

Receiver Implementation Pattern (Abstract)

When a service receives a request with trace context:

Receiver Process:

1. Extract traceparent from transport:
   HTTP: Read "traceparent" header
   MCP:  Read _meta.traceparent field

2. Parse W3C format:
   Split on "-" → [version, trace_id, parent_id, flags]
   Validate format (4 components, correct lengths)
   Convert hex strings to binary

3. Create remote span context:
   RemoteContext {
     trace_id: (from traceparent),
     parent_id: (from traceparent)
   }

4. Create local span as child:
   LocalSpan {
     trace_id: RemoteContext.trace_id,      ← inherited
     parent_span_id: RemoteContext.parent_id, ← links to remote parent
     span_id: (generate new)                 ← unique to this span
   }

5. Execute operation under span

6. Span hierarchy is now distributed:
   Remote Span (AAAA) → Local Span (BBBB)
   Connected via parent_span_id = AAAA

Propagation Timing

The trace context is extracted and injected at a specific point in the request lifecycle:

Timeline:
  t0: Span created (in-process)
  t1: Span entered (becomes "current")
  t2: Code initiates outbound request
  t3: ← PROPAGATION HAPPENS HERE ←
      - Extract current span context from thread-local
      - Serialize to W3C traceparent format
      - Inject into request (header or meta field)
  t4: Request sent over network
  t5: Span continues execution (still active)
  t6: Response received
  t7: Span ends

Critical point: Propagation (t3) happens while the span is still active. The span context is captured at request-build time, not at span-creation time.

Error Scenarios

Scenario 1: No active span
  Current span is invalid or doesn't exist
  → Propagation is silently skipped
  → Request has no trace context
  → Remote process starts new trace (if instrumented)

Scenario 2: Invalid span context
  Span context has all-zero trace_id or span_id
  → is_valid() returns false
  → Propagation is silently skipped
  → Request proceeds without trace context

Scenario 3: Tracing disabled
  Tracing not initialized or disabled
  → No span context available
  → Request proceeds without trace context

Scenario 4: Remote doesn't support tracing
  Traceparent sent but remote ignores it
  → No remote spans created
  → Trace ends at process boundary
  → Not an error, just incomplete trace

Scenario 5: Format mismatch
  Remote expects different format (not W3C)
  → Trace context not recognized
  → Remote may start independent trace

OpenTelemetry Semantic Conventions

GenAI Semantic Conventions

For LLM operations, we follow the OpenTelemetry GenAI semantic conventions:

Required attributes:

gen_ai.operation.name - Type of operation (e.g., "chat", "embedding")
gen_ai.system - The AI system (e.g., "openai", "anthropic")
gen_ai.request.model - Model identifier (e.g., "gpt-4o", "claude-sonnet-4")

Span naming:

Format: "{operation} {model}"
Examples: "chat gpt-4o", "embedding text-embedding-3-small"
Implemented via otel.name attribute (not the tracing span name)

Why manual span creation:

#[instrument] macro doesn't support otel.name attribute
Need custom span name format different from function name
Require specific GenAI attributes not supported by #[instrument]

Example:

fn create_chat_span(model_name: &str) -> tracing::Span {
    tracing::info_span!(
        "chat",  // Internal tracing name
        otel.name = format!("chat {}", model_name),  // OTel display name
        gen_ai.request.model = %model_name,
        gen_ai.system = "openai",
        gen_ai.operation.name = "chat"
    )
}

Tool Execution Spans

Tool calls create spans with:

Name: "gen_ai.tool.call"
Attributes:
- tool.name - Name of the tool being called
Lifetime: Span is kept alive during entire tool execution

Pattern:

fn tool_stream<S, F>(
    rx: S,
    done: F,
    tool_name: Option<String>
) -> ToolStream {
    Box::pin(async_stream::stream! {
        // Create and enter span if tool name provided
        let tool_span = tool_name.as_ref().map(|name| {
            tracing::info_span!("gen_ai.tool.call", tool.name = %name)
        });
        let _span_guard = tool_span.as_ref().map(|span| span.enter());

        // Tool execution happens here under span
        // ...

        // Span ends when guard drops
    })
}

CLI Execution Spans

The CLI creates root spans for recipe and run execution:

Recipe execution:

let span = tracing::info_span!(
    "recipe.execute",
    session.id = %session_id,
    recipe.name = %recipe_name
);
let _enter = span.enter();
// Recipe execution happens here

Run execution:

let span = tracing::info_span!(
    "run.execute",
    session.id = %session_id
);
let _enter = span.enter();
// Run execution happens here

These root spans become the parent for all agent operations, creating a complete trace hierarchy.

Example Patterns

Example 1: LLM Provider Span Pattern

fn create_chat_span(model_name: &str) -> tracing::Span {
    tracing::info_span!(
        "chat",
        otel.name = format!("chat {}", model_name),
        gen_ai.request.model = %model_name,
        gen_ai.system = "openai",
        gen_ai.operation.name = "chat"
    )
}

async fn stream(...) -> Result<MessageStream, ProviderError> {
    // Create span before lazy stream
    let span = create_chat_span(&self.model.model_name);

    Ok(Box::pin(try_stream! {
        // Enter span when stream executes
        let _span_guard = span.enter();

        // HTTP request happens here with correct parent span
        // The API client will inject traceparent header automatically
        let response = self.client.post(url).send().await?;

        // Stream response processing
        // ...

        // Span ends when guard drops (stream completes)
    }))
}

Uses the lazy stream pattern with manual span creation for GenAI conventions.

Example 2: Agent Reply Pattern

#[instrument(name = "agent.reply", skip(self, user_message, session_config),
             fields(user_message, session.id = %session_config.id))]
pub async fn reply(
    &self,
    user_message: Message,
    session_config: &SessionConfig,
) -> Result<AgentStream> {
    // #[instrument] creates span for this function
    self.reply_internal(user_message, session_config).await
}

async fn reply_internal(
    &self,
    user_message: Message,
    session_config: &SessionConfig,
) -> Result<BoxStream<'_, Result<AgentEvent>>> {
    // Capture span from reply() before creating lazy stream
    let reply_span = tracing::Span::current();

    Ok(Box::pin(async_stream::try_stream! {
        // Enter span when stream executes
        let _reply_span_guard = reply_span.enter();

        loop {
            // LLM calls here have correct parent span (agent.reply)
            let mut stream = Self::stream_response_from_provider(...).await?;

            // Tool calls here also have correct parent span
            let tool_result = self.dispatch_tool_call(...).await;

            // All nested operations inherit the agent.reply span as parent
        }
    }))
}

Two-level pattern:

reply() uses #[instrument] to create named span with attributes
reply_internal() captures and enters span in lazy stream
Session naming task is instrumented with same span (if spawned)

Testing Span Hierarchies

Expected Hierarchy Example

run.execute (CLI root span)
  └─ agent.reply (from #[instrument])
      ├─ chat gpt-4o-mini (session naming, if needed)
      ├─ gen_ai.tool.call (MCP tool)
      │   └─ (tool execution in remote MCP server)
      └─ chat gpt-4o (main completion)

Conceptual Validation Pattern

When testing distributed traces, verify:

1. Parent-Child Relationships:

// Child's parent_span_id should match parent's span_id
assert_eq!(
    child_span.parent_span_id,
    parent_span.span_context.span_id(),
    "Child should link to parent"
);

2. Shared Trace ID:

// All spans in same trace share trace_id
assert_eq!(
    child_span.span_context.trace_id(),
    parent_span.span_context.trace_id(),
    "All spans must share trace_id"
);

3. Temporal Enclosure:

// Parent must strictly enclose child in time
assert!(parent.start_time <= child.start_time);
assert!(parent.end_time >= child.end_time);

4. Cross-Process Propagation:

// Extract traceparent from request
let traceparent = /* extract from _meta or header */;

// Parse W3C format
let parts: Vec<&str> = traceparent.split('-').collect();
assert_eq!(parts.len(), 4, "Invalid W3C format");
assert_eq!(parts[0], "00", "Invalid version");
assert_eq!(parts[1].len(), 32, "Invalid trace_id length");
assert_eq!(parts[2].len(), 16, "Invalid span_id length");

// Verify span context is valid (not all zeros)
assert_ne!(parts[2], "0000000000000000", "Span ID must be valid");

Testing Infrastructure Patterns (Abstract)

For testing distributed tracing, you need:

1. Trace Collector:

Receives OTLP (OpenTelemetry Protocol) exports
Stores spans in memory for inspection
Runs on ephemeral port for test isolation

2. Fake Servers:

MCP server that captures _meta.traceparent
LLM provider that captures traceparent header
Both run on ephemeral ports

3. Validation:

Extract and parse traceparent values
Collect all exported spans
Verify hierarchy, IDs, and timing

4. Test Flow:

1. Start OTLP collector
2. Start fake MCP/LLM servers
3. Configure goose to use fake servers
4. Execute operation that should create spans
5. Wait for span export (async)
6. Collect and verify spans

Debugging and Troubleshooting

Common Issues

1. Spans not appearing in traces:

Check that tracing is initialized before span creation
Verify OTLP exporter is configured correctly
Ensure spans are being exported (may have delay)
Check that sampling is enabled

2. Broken span hierarchy (orphaned spans):

Verify span is entered before child operations
Check that lazy streams capture and enter span
Ensure spawned tasks use .in_current_span() or .instrument()
Look for missing instrumentation on async functions

3. Missing trace context in remote calls:

Verify span context is_valid() returns true
Check that span_id is not all zeros
Ensure span is active when request is built
Debug by logging traceparent before sending

4. Span guard lifetime issues:

Use let _guard = span.enter() not let _ = span.enter()
Underscore-prefix keeps guard alive
Guard must live for entire scope where span should be active

Verification Techniques

1. Check active span:

let span = tracing::Span::current();
eprintln!("Current span: {:?}", span.metadata().map(|m| m.name()));

2. Verify span context validity:

use opentelemetry::trace::TraceContextExt;
use tracing_opentelemetry::OpenTelemetrySpanExt;

let context = tracing::Span::current().context();
let span = context.span();
let span_context = span.span_context();
eprintln!("Span context valid: {}", span_context.is_valid());
eprintln!("Trace ID: {:032x}", span_context.trace_id());
eprintln!("Span ID: {:016x}", span_context.span_id());

3. Inspect propagated traceparent:

// For HTTP (in request builder)
eprintln!("Injecting traceparent: {}", traceparent_header);

// For MCP (in extension injection)
eprintln!("Meta contains traceparent: {}", meta.get("traceparent").is_some());

Summary

Pattern Selection Guide

Context	Pattern	Example
Regular async function	`#[instrument]`	`#[instrument] async fn foo()`
Need custom OTel attributes	Manual span creation	`create_chat_span()` + `let _guard = span.enter()`
Spawned task (current span)	`.in_current_span()`	`tokio::spawn(work().in_current_span())`
Spawned task (specific span)	`.instrument(span)`	`tokio::spawn(work().instrument(reply_span))`
Lazy stream	Capture + enter	`let s = Span::current(); try_stream! { let _g = s.enter() }`

When to Use Manual Span Creation

Use manual span creation instead of #[instrument] when:

Need otel.name attribute (for custom OTel span names)
Following OpenTelemetry semantic conventions (GenAI, HTTP, etc.)
Want different span name than function name
Need to dynamically construct span attributes
Require precise control over span lifecycle

Key Principles

Thread-local storage does NOT propagate across .await
- OpenTelemetry span context is thread-local
- Async tasks may resume on different threads
- Must use instrumentation patterns to propagate
Task-local storage DOES propagate across .await
- tokio::task_local! is task-scoped, not thread-scoped
- Useful for session IDs and other request-scoped data
#[instrument] uses Instrument trait
- Automatically propagates span across .await points
- Best for standard async functions
- Limited attribute support (no otel.name)
Lazy streams need capture + enter pattern
- Stream created != stream executing
- Must capture span before creation
- Enter span when stream executes
- Critical: Use let _guard = span.enter() not let _ = span.enter()
Span guards must stay alive
- Underscore-prefixed names (_guard, _span_guard) suppress warnings
- But keep the guard alive until drop
- Guard drop = span exit
Cross-process propagation uses W3C Trace Context
- HTTP: traceparent header
- MCP: _meta.traceparent field
- Same W3C format for both
- Automatic injection during request building
Distributed hierarchy maintained via IDs
- trace_id same across all processes
- span_id unique per span
- parent_span_id links to parent's span_id
- Works across network boundaries
Validation prevents invalid propagation
- Span context must have non-zero trace_id and span_id
- Invalid contexts are silently skipped
- Remote processes start new traces if no valid context received
OpenTelemetry semantic conventions matter
- GenAI spans require specific attributes
- Use manual span creation for conventions compliance
- Proper naming helps in trace visualization tools

Nov 06 '25 07:11 codefromthecrypt

feat: tidy otel and implement http and MCP propagation

Summary

Type of Change

Testing

Remove this if using GPT-5 series (e.g., gpt-5, gpt-5-mini, gpt-5-nano)

Related Issues

Screenshots/Demos (for UX changes)

Async + OpenTelemetry

Libraries in Use

Async Runtime

Tracing/OpenTelemetry

How OpenTelemetry Span Context Works

Thread-Local Storage Mechanism

The Critical Problem with Async

The Bridge: tracing-opentelemetry

Alternative: Task-Local Storage

Correct Instrumentation Patterns

Pattern 1: Regular Async Functions with #[instrument]

Pattern 2: Manual Span Creation

Pattern 3: Spawned Tasks

Pattern 4: Lazy Streams (async_stream::try_stream!)

OpenTelemetry Span Hierarchy

Span Context Components

Parent-Child Relationships

Validation Rules

Propagation Methods

Cross-Process Trace Context Propagation

Architecture Overview

W3C Trace Context Format

State Transformation Pipeline

HTTP Propagation

MCP Propagation

Remote Parent Relationships

Receiver Implementation Pattern (Abstract)

Propagation Timing

Error Scenarios

OpenTelemetry Semantic Conventions

GenAI Semantic Conventions

Tool Execution Spans

CLI Execution Spans

Example Patterns

Example 1: LLM Provider Span Pattern

Example 2: Agent Reply Pattern

Testing Span Hierarchies

Expected Hierarchy Example

Conceptual Validation Pattern

Testing Infrastructure Patterns (Abstract)

Debugging and Troubleshooting

Common Issues

Verification Techniques

Summary

Pattern Selection Guide

When to Use Manual Span Creation

Key Principles