Inconsistent and very slow performance with non-Anthropic models
Is this designed to work with Anthropic models only? I'm asking because I've tried with OpenAI ChatGPT 5, 5-mini, 5-nano and Qwen3 (locally with LMStudio) and the performance is super slow and the output very inconsistent. I would say it's unable to plan and/or follow through the plan. Is prompt caching (Anthopic feature only) critical in this as well?
Could you share some examples (use cases) or traces (from LangSmith, if you have any) of poor performance with OpenAI models?
Prompt caching is nice to have for Anthropic (it cuts costs in about half from what I've seen) but is not essential / shouldn't impact quality.
I'll try if I get the traces... But if you want to replicate, just do this:
agent = create_deep_agent(
tools=[internet_search],
model="openai:gpt-5",
instructions=research_instructions,
subagents=[critique_sub_agent, research_sub_agent],
).with_config({"recursion_limit": 1000})
and ask "Compare the performance of Sinner and Alcaraz". With the default model (Anthropic) it works as expected. With the above code it does not plan the TODO or it does not complete. I've tried also with Sonnet 4.5 and had an issue with token limits and could not complete.
2025-10-01T06:57:10.043965Z [error ] Background run failed. Exception: <class 'anthropic.RateLimitError'>(Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'This request would exceed the rate limit for your organization (be0f6169-4909-4afb-be7c-ae88f49e9153) of 30,000 input tokens per minute. For details, refer to: https://docs.claude.com/en/api/rate-limits. You can see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}, 'request_id': 'req_011CTg3SxciJzXzN7sb1SCcp'})
Hey @marcofiocco thanks for sharing the snippet and apologies for the delay in response.
The research example in particular can definitely be updated to converge faster, it's intended as a sample starter prompt for a researcher, and showcases how you can use tools and subagents. A prompt for a deep agent should generally be much more detailed. One thing I've found particularly helpful is offering examples and heuristics on "how hard to try" for different tasks.
I think a lot of the frustration that you're running into can be solved by a more custom prompt for the example. Depending on your model, and your rate limits (like you flagged), you might need to customize the system prompt to do fewer things in parallel. Let me know if updating the prompt here works for you, and feel free to contribute up a PR for the example prompt that you find works better!
@marcofiocco Your assessment around Deep Agent = Anthopic only is what I had found in my work as well. I wasnt able to get subagents to be invoked from the orchestrator agent using GPT5 either. All the feedback that I got from the devs was that my prompt was incorrect but I dont think this is the real issue...
Hey @DerekKane mind sharing your prompts and also how your subagents are defined? I can tal!
@nhuang-lc - Thanks for taking a look at this one. I have an MCP Server which is a MS Cosmos query tool that runs a SELECT statement to grab the latest record with a Client ID, for context.
This setup works perfectly on the default Anthropic models in the DeepAgent v1 framework to spin up the sub agents and use tools appropriately. If the only change I make is a shift to the Azure OpenAI models, the orchestrator never initiates the subagent and there is an impact in the todo list creation as well.
If you can get GPT5 or GPT5-Chat from Azure OpenAI working.... that would be really great because I havent solved it. Here is the code:
Contract Agent: Deep Agent with MCP Tools Capability
Load the libraries
import os import sys from typing import Literal, Dict from pathlib import Path import json import logging from datetime import datetime from dotenv import load_dotenv
MCP Servers
import asyncio from langchain_mcp_adapters.client import MultiServerMCPClient
Helper function to deal with Sync/Async operations
from wrap_mcp_tools_sync import make_sync_tools
Load the master deep agent
from deepagents import create_deep_agent, SubAgent
Pull in the correct .env file for keys
CURRENT_DIR = Path(file).resolve().parent load_dotenv(CURRENT_DIR / ".env")
Run a check on the async/sync method
def _get_loop(): try: return asyncio.get_event_loop() except RuntimeError: loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) return loop
Activate and use the MCP Servers for Agentic Tools
IMPORTANT: use an absolute path to your server on Windows
MS_COSMOS_SERVER_PATH = r"C:\Users\derek\Documents\Sample\AI_Agent\deepagents\examples\mcp_agent\mcp_server\ms_cosmos_mcp_server.py"
Pass env through
_server_env = dict(os.environ)
(optional) Ensure key exists; otherwise the server will still start,
_mcp_client = MultiServerMCPClient( { "MS-Cosmos": { "command": sys.executable, # venv python "args": ["-u", MS_COSMOS_SERVER_PATH, "--stdio"], # make stdio explicit "transport": "stdio", "env": _server_env, "cwd": str(Path(MS_COSMOS_SERVER_PATH).parent), # stable working dir } } )
Load tools without creating ad-hoc loops during import
def _load_mcp_tools_sync(): try: return asyncio.run(_mcp_client.get_tools()) except RuntimeError as e: # if you ever hit "asyncio.run() cannot be called..." under a running loop import anyio if "cannot be called from a running event loop" in str(e): return anyio.from_thread.run(_mcp_client.get_tools) raise
_mcp_tools = _load_mcp_tools_sync() tools = make_sync_tools(_mcp_tools)
Define the SubAgents that are available to the master agent
Sub-agent configurations
compensation_analyst = { "name": "compensation_analyst", "description": "Finds and normalizes all brokerage-compensation terms; proposes one governing term.", "prompt": """You are the Compensation Analyst. ALWAYS OUTPUT something, even if evidence is missing (use NEEDS_EVIDENCE rows).
Inputs:
- packet_index (list of {doc_id, type, page_count})
- text_blocks (normalized text with positions)
- metadata (parties/property/dates if available)
Tasks:
- Extract every buyer-broker compensation term (%, flat $, incentives, MLS co-broke, seller-paid vs buyer-paid).
- Note conflicts across offers, counters, agency agreements, MLS, addenda; flag cumulative/dual-pay risk.
- Recommend ONE clean governing compensation term that avoids double-pay and lender IPC issues.
- Provide CDA-ready mapping basics (side, basis, line text) at current price if present.
Rules:
- Cite each claim with {doc_id, page, span/excerpt}.
- If a document is missing or image-only, add a NEEDS_EVIDENCE row naming the doc and why.
Output (concise):
- compensation_matrix.csv (source, term, payor, read_as, risk, citation)
- governing_compensation.md (2–5 bullet points, with citations) """ }
timeline_contingency_analyst = { "name": "timeline_contingency-analyst", "description": "Builds the contingency calendar and flags timing defects with precise dates.", "prompt": """You are the Timeline & Contingency Analyst. ALWAYS OUTPUT something, even if evidence is missing (use NEEDS_EVIDENCE rows).
Inputs:
- packet_index (list of {doc_id, type, page_count})
- text_blocks (normalized text with positions)
- metadata (acceptance/offer dates if available)
Tasks:
- Extract deadlines (financing, inspection, testing, survey, title cure, warranty, other).
- Convert any relative windows to absolute dates (YYYY-MM-DD). Show your math from the trigger date.
- Compare required windows vs any known actual actions; mark on-time/late/unknown.
- Propose plain-English extensions/ratifications with exact dates if needed (no legal advice).
Rules:
- Cite each date/window with {doc_id, page, span/excerpt}.
- If acceptance date or trigger is missing, add a NEEDS_EVIDENCE row.
Output (concise):
- contingency_calendar.csv (obligation, trigger, window, due_date, actual_date, status, citation)
- timeline_summary.md (2–5 bullet points, with citations) """ }
critique_editor = { "name": "critique_editor", "description": "Audits the draft brief for clarity, consistency, and evidence coverage; proposes concrete fixes.", "prompt": """You are the Critique & QA Editor. ALWAYS OUTPUT something, even if inputs are incomplete.
Inputs (provide what you have):
- final_report_draft (text of the draft broker brief)
- compensation_matrix.csv (if available)
- governing_compensation.md (if available)
- contingency_calendar.csv (if available)
- timeline_summary.md (if available)
Checks (be strict, but plain-English):
- Citations: Every material claim includes {doc_id, page, span/excerpt}. Flag any missing.
- Dates: All deadlines are absolute (YYYY-MM-DD) with math shown or referenced. Flag relative phrasing.
- Consistency: The governing compensation term is single, unambiguous, and not contradicted elsewhere.
- Lender Sensitivity: Note risks of double-pay/IPC framing inconsistencies.
- Timeline Math: Verify trigger → window → due_date math; flag unclear triggers.
- Clarity & Actionability: Short bullets; operational tone; concrete next-step owners/deadlines.
Output (concise, actionable):
- issues.md: Numbered list of findings with fields: {severity: [H|M|L], section, problem, location_reference, exact_fix}
- revised_final_report.md: If fixes are purely editorial (typos/format/citation placement/absolute dates derivable from provided math), apply them and output the corrected report.
- revision_requests.md: If fixes require new evidence or re-analysis, list specific requests (doc_id or data needed) and where they will be used. """ }
Sub-agents
subagents = [compensation_analyst, timeline_contingency_analyst, critique_editor]
Define the Master Agent Instructions
Main research instructions
orchestrator_instructions = """You are the ORCHESTRATOR for a residential real-estate packet. Fetch the latest packet by Parent_ID, ALWAYS run the two analysis sub-agents, synthesize a draft, then ALWAYS run the critique-editor before finalizing.
ALWAYS DO FIRST
- The first thing you should do is to write the original user question to
question.txtso you have a record of it. - Use the
write_todostool to write a brief description of what you plan to do totodo.txt. - If Parent_ID (e.g., Client_03) is not provided, ask for it and STOP.
- Call tool
latest_contract_by_parentwith the Parent_ID. Treat the result as source-of-truth for context. - If the result is empty/ambiguous, ask for a correct Parent_ID and STOP.
ALWAYS RUN THESE SUB-AGENTS (no exceptions)
- compensation-analyst
- timeline-contingency-analyst Provide each only the smallest sufficient context: packet_index, relevant text_blocks, and minimal metadata.
SYNTHESIZE DRAFT (brief and operational)
Write final_report_draft.md with:
- Snapshot (parties, property, acceptance/offer date(s) with citations)
- Compensation (3–6 bullets + ONE governing term with citations)
- Timeline & Contingencies (3–6 bullets + any required extensions with absolute dates and citations)
- Next Steps (24–48h) with owners
CRITIQUE PASS (MANDATORY)
- Provide the draft and available artifacts to
critique-editor. - Save outputs: • issues.md • revised_final_report.md (if edits were auto-applied) • revision_requests.md (if new evidence or re-analysis is needed)
FINALIZE
- If
revised_final_report.mdexists, copy it tofinal_report.md. - Otherwise, keep
final_report_draft.mdasfinal_report.md. - If issues with severity H or M remain unresolved, append a short “Open Items” section to
final_report.mdsummarizing blocking items and required docs.
GLOBAL RULES
- Every claim includes a citation {doc_id, page, span/excerpt}.
- Convert all relative time windows to absolute dates (YYYY-MM-DD) and show/trace the math.
- If info is missing or a PDF is image-only, sub-agents still output with NEEDS_EVIDENCE rows.
ARTIFACTS
- compensation_matrix.csv
- governing_compensation.md
- contingency_calendar.csv
- timeline_summary.md
- final_report_draft.md
- issues.md
- revised_final_report.md (optional)
- revision_requests.md (optional)
- final_report.md
DELEGATION PROTOCOL (STRICT)
- To run a sub-agent, CALL IT BY NAME (exactly): compensation_analyst, timeline_contingency_analyst, critique_editor.
- Always run in this order for each analysis:
- compensation_analyst 2) timeline_contingency_analyst 3) critique_editor
- Do not attempt to replicate their work yourself; delegate.
"""
Define a custom model - AzureChatOpenAI
import os from langchain_openai import AzureChatOpenAI
env_vars = { "AZURE_OPENAI_API_KEY": "XXXXXXXX", "AZURE_OPENAI_ENDPOINT": "XXXXXXXX", "AZURE_OPENAI_DEPLOYMENT_NAME": "gpt-5-chat", "AZURE_OPENAI_API_VERSION": "2025-01-01-preview", } os.environ.update(env_vars)
def get_default_model():
"""
Returns an Azure OpenAI chat model via LangChain.
Requires these env vars:
- AZURE_OPENAI_API_KEY
- AZURE_OPENAI_ENDPOINT (e.g., https://
model = get_default_model()
Create the DeepAgent
agent = create_deep_agent( tools=tools, instructions=orchestrator_instructions, model=model, subagents=subagents, ).with_config({"recursion_limit": 1000})
How do I use open source model with the deepagent code, is there any example, can anyone share some code snippet to use qwen3 instruct or qwen3 models?