Feature Request: Built-in Trace ID, Execution Metadata & Replay Mode in PromptFlow
Is your feature request related to a problem? Please describe.
When building multi-step flows in PromptFlow (LLM → Python → LLM → Python), I found it challenging to:
- propagate a consistent trace_id across nodes,
- attach execution-level metadata,
- and deterministically replay a previous run with a known trace_id.
These capabilities are useful for debugging, reproducible QA runs, CI/CD pipelines, and enterprise audit requirements.
Currently, users must manually implement these features in Python nodes.
Describe the solution you'd like
It would be helpful if PromptFlow provided built-in support for:
-
Trace ID propagation A node should be able to emit a trace_id, or inherit one from upstream inputs.
-
Execution metadata envelope A standard metadata dict (timestamp, node name, trace_id, input preview, etc.) accessible to downstream nodes.
-
Replay Mode A lightweight mechanism to replay a run using an existing trace_id for deterministic reproduction.
-
Replay Stub Operator (optional) A small built-in operator that can surface metadata or deterministic replay behavior, similar to a Python node but without needing custom code.
Describe alternatives you've considered
I created a minimal working POC inside PromptFlow using:
- A structure validator (
validate.py) - A post-process node that generates or inherits a trace_id
- Attached metadata envelope
- A replay stub (
replay_stub.py) - A flow file wiring everything together (
flow.dag.yaml)
The approach works, but each feature requires custom logic and manual wiring. Native support would be simpler, more consistent, and safer for production workflows.
Additional context
Below is the full POC implementation (all files are small and self-contained):
flow.dag.yaml
schema_version: 1.0.0
name: minimal-trace-replay-poc
description: Minimal flow to illustrate trace id + execution metadata + replay mode
inputs:
user_text:
type: string
default: "PromptFlow POC input text."
replay_mode:
type: boolean
default: false
trace_id:
type: string
default: null
outputs:
final_output:
type: string
reference: post_process.output
nodes:
- name: summarize
type: llm
provider: openai
api: chat
model: gpt-4.1-mini
inputs:
prompt: |
You are a concise assistant.
Produce a valid JSON response:
{"summary": "..."}
Text:
{{user_text}}
- name: validate_structure
type: python
source:
type: code
path: validate.py
inputs:
raw: ${summarize.output}
- name: post_process
type: python
source:
type: code
path: post_process.py
inputs:
text: ${validate_structure.structured}
trace_id: {{inputs.trace_id}}
- name: replay_stub
type: python
source:
type: code
path: replay_stub.py
inputs:
trace_id: ${post_process.trace_id}
replay_mode: {{inputs.replay_mode}}
metadata: ${post_process.metadata}
validate.py
import json
from typing import Dict
def main(raw: str) -> Dict:
try:
data = json.loads(raw)
if "summary" not in data:
raise ValueError("Missing 'summary' key")
return {"valid": True, "structured": data}
except Exception as e:
raise ValueError(f"Structure validation failed: {e}\nRaw: {raw}")
post_process.py
import uuid
from datetime import datetime
from typing import Dict
def main(input: Dict) -> Dict:
summary = input.get("text", {}).get("summary", "")
trace_id = input.get("trace_id") or str(uuid.uuid4())
metadata = {
"trace_id": trace_id,
"timestamp": datetime.utcnow().isoformat(),
"node": "post_process",
"input_preview": summary[:50],
"replay_intent": "capture_for_replay"
}
return {
"output": f"[POST-PROCESSED] {summary}",
"trace_id": trace_id,
"metadata": metadata
}
replay_stub.py
from typing import Dict
def main(trace_id: str = None, replay_mode: bool = False, metadata: Dict = None):
if replay_mode and trace_id:
return {
"output": f"[REPLAYED_RUN] trace_id={trace_id}",
"source": "replay_stub",
"metadata": metadata
}
return {"output": "normal_execution"}
These features could greatly enhance reproducibility and observability
for multi-step flows in real-world scenarios.
Thanks for reviewing!