Feature Request: Built-in Trace ID, Execution Metadata & Replay Mode in PromptFlow

Open yuer-dsl opened this issue 1 month ago • 0 comments

Is your feature request related to a problem? Please describe.

When building multi-step flows in PromptFlow (LLM → Python → LLM → Python), I found it challenging to:

propagate a consistent trace_id across nodes,
attach execution-level metadata,
and deterministically replay a previous run with a known trace_id.

These capabilities are useful for debugging, reproducible QA runs, CI/CD pipelines, and enterprise audit requirements.

Currently, users must manually implement these features in Python nodes.

Describe the solution you'd like

It would be helpful if PromptFlow provided built-in support for:

Trace ID propagation A node should be able to emit a trace_id, or inherit one from upstream inputs.
Execution metadata envelope A standard metadata dict (timestamp, node name, trace_id, input preview, etc.) accessible to downstream nodes.
Replay Mode A lightweight mechanism to replay a run using an existing trace_id for deterministic reproduction.
Replay Stub Operator (optional) A small built-in operator that can surface metadata or deterministic replay behavior, similar to a Python node but without needing custom code.

Describe alternatives you've considered

I created a minimal working POC inside PromptFlow using:

A structure validator (validate.py)
A post-process node that generates or inherits a trace_id
Attached metadata envelope
A replay stub (replay_stub.py)
A flow file wiring everything together (flow.dag.yaml)

The approach works, but each feature requires custom logic and manual wiring. Native support would be simpler, more consistent, and safer for production workflows.

Additional context

Below is the full POC implementation (all files are small and self-contained):

`flow.dag.yaml`

schema_version: 1.0.0
name: minimal-trace-replay-poc
description: Minimal flow to illustrate trace id + execution metadata + replay mode

inputs:
  user_text:
    type: string
    default: "PromptFlow POC input text."
  replay_mode:
    type: boolean
    default: false
  trace_id:
    type: string
    default: null

outputs:
  final_output:
    type: string
    reference: post_process.output

nodes:
  - name: summarize
    type: llm
    provider: openai
    api: chat
    model: gpt-4.1-mini
    inputs:
      prompt: |
        You are a concise assistant.
        Produce a valid JSON response:
        {"summary": "..."}
        Text:
        {{user_text}}

  - name: validate_structure
    type: python
    source:
      type: code
      path: validate.py
    inputs:
      raw: ${summarize.output}

  - name: post_process
    type: python
    source:
      type: code
      path: post_process.py
    inputs:
      text: ${validate_structure.structured}
      trace_id: {{inputs.trace_id}}

  - name: replay_stub
    type: python
    source:
      type: code
      path: replay_stub.py
    inputs:
      trace_id: ${post_process.trace_id}
      replay_mode: {{inputs.replay_mode}}
      metadata: ${post_process.metadata}

validate.py

import json
from typing import Dict

def main(raw: str) -> Dict:
    try:
        data = json.loads(raw)
        if "summary" not in data:
            raise ValueError("Missing 'summary' key")
        return {"valid": True, "structured": data}
    except Exception as e:
        raise ValueError(f"Structure validation failed: {e}\nRaw: {raw}")

post_process.py

import uuid
from datetime import datetime
from typing import Dict

def main(input: Dict) -> Dict:
    summary = input.get("text", {}).get("summary", "")
    trace_id = input.get("trace_id") or str(uuid.uuid4())

    metadata = {
        "trace_id": trace_id,
        "timestamp": datetime.utcnow().isoformat(),
        "node": "post_process",
        "input_preview": summary[:50],
        "replay_intent": "capture_for_replay"
    }

    return {
        "output": f"[POST-PROCESSED] {summary}",
        "trace_id": trace_id,
        "metadata": metadata
    }

replay_stub.py

from typing import Dict

def main(trace_id: str = None, replay_mode: bool = False, metadata: Dict = None):
    if replay_mode and trace_id:
        return {
            "output": f"[REPLAYED_RUN] trace_id={trace_id}",
            "source": "replay_stub",
            "metadata": metadata
        }
    return {"output": "normal_execution"}

These features could greatly enhance reproducibility and observability
for multi-step flows in real-world scenarios.
Thanks for reviewing!

Nov 16 '25 05:11 yuer-dsl