📝 Migration: OpenAI Responses API

Why migrate?

OpenAI’s Responses API is the new, recommended interface. It supersedes classic Chat Completions with built-in tools (web/file search, computer use), structured state, and event-streaming.

AG2 already exposes OpenAI via LLMConfig(api_type=...), and Responses is supported but currently limited (mainly two-agent initiate_chat). We want full coverage without breaking existing api_type="openai" apps.

📚 Key References

🎯 Outcomes

Full AG2 feature parity with api_type="responses" (two-agent, GroupChat, tools/function calling, streaming, multimodal)
No application-layer breaks: existing api_type="openai" (Chat Completions) continues to work; Responses is a config-only opt-in
Compatibility layer: Responses outputs normalized to legacy Chat-shape for internal code

🗂️ AG2-specific context

Click to expand AG2 code touchpoints

autogen/agentchat/conversable_agent.py
- generate_reply(...): Orchestrates: termination/human → tool_calls → code_exec → LLM reply
- _generate_oai_reply_from_client(...): Flattens tool responses into messages, calls wrapper create(...), normalizes via extract_text_or_completion_object(...)
autogen/agentchat/groupchat.py
- GroupChatManager.run_chat/a_run_chat: Triggers speaker.generate_reply(...), emits GroupChatRunChatEvent and TerminationEvent via IO streams
autogen/oai/client.py (OpenAIWrapper)
- Routes by api_type; attaches response.message_retrieval_function = client.message_retrieval, normalizes outputs
autogen/oai/openai_responses.py
- OpenAIResponsesClient.create(...): Converts legacy messages → Responses input blocks, wires built-in tools, handles response_format vs stream, tracks previous_response_id
- message_retrieval(...): Adapts Responses outputs to legacy ChatCompletion-like assistant message
autogen/oai/openai_utils.py: Builds config list entries (supports api_type)
autogen/oai/client_utils.py: Validates params and tool visibility

✅ Migration Checklist

1. Configuration (opt-in)

[ ] OAI_CONFIG_LIST supports api_type: "responses" per entry
[ ] Document optional fields: built_in_tools (e.g., image_generation, web_search), tool_choice

2. Routing (no public API changes)

[ ] Verify OpenAIWrapper dispatches api_type: responses → OpenAIResponsesClient

3. Message flow & normalization (stateless default)

[ ] Build full messages context locally in agents
[ ] In OpenAIResponsesClient.create, convert messages → Responses input blocks
[ ] Ensure message_retrieval returns a single assistant message with content and optional tool_calls (legacy shape)

4. Stateless vs stateful handling (and impacts)

[ ] Default: stateless (send full context; no server thread state)
[ ] Add optional stateful threading: use_response_state (default false) on OpenAIResponsesLLMConfigEntry
[ ] When stateful enabled:
- Client maintains previous_response_id across turns
- Still include tool outputs in messages to avoid divergence
- Provide reset_state() to start new thread; log thread IDs at debug
- GroupChat: do not share thread state across agents
- Privacy: note server-side retention in docs for sensitive workloads

5. Streaming

[ ] Preserve existing stream events (no API change)
[ ] If response_format present, drop stream (Responses restriction) and log warning

6. Tools

[ ] Map built_in_tools to Responses scopes; account for image costs
[ ] External tool calls: keep schema/flow unchanged; normalize tool call names/args

7. GroupChat & run surfaces

[ ] No changes to GroupChatManager.run_chat/a_run_chat; event emissions unchanged
[ ] Ensure normalized assistant message continues to drive GroupChat flows

8. Usage & cost

[ ] Populate token usage/model fields; aggregate image costs

9. Documentation

[ ] README: add “Using Responses API” with config example and notes
- Default: Chat Completions; Responses is opt-in
- Streaming + response_format are mutually exclusive
- Optional stateful mode via use_response_state and reset_state()

🗃️ File-by-file guidance

Expand for implementation details

autogen/oai/openai_responses.py
- In OpenAIResponsesLLMConfigEntry: add use_response_state: bool = False
- In OpenAIResponsesClient.__init__: initialize self._previous_response_id: str | None = None
- In create(...):
  - Convert messages to input blocks (stateless default)
  - If config.use_response_state and self._previous_response_id is set, include previous_response_id in request
  - After response, if new response.id available, set self._previous_response_id = response.id
  - If response_format present, remove stream param and log
- Add def reset_state(self) -> None: to clear self._previous_response_id
- Ensure message_retrieval(...) returns single assistant message dict with content and optional tool_calls
autogen/oai/client.py
- No API change. Optionally expose passthrough to reset_state() on active client
autogen/agentchat/conversable_agent.py
- No behavior change. Keep stateless construction of messages; tool responses flattened before client call
autogen/agentchat/groupchat.py
- No behavior change. Events (GroupChatRunChatEvent, TerminationEvent) remain as-is

ℹ️ Notes

Streaming and response_format are mutually exclusive per Responses API
Stateful mode is advanced/opt-in; default is stateless for reproducibility
Multimodal inputs must map to Responses input blocks; normalized assistant output remains text

🔄 Responses API: Stateless and Stateful Modes

OpenAI's Responses API supports both stateless and stateful operation. AG2 must plan for and support both:

Stateless (default):
- AG2 builds the complete messages context locally per turn and sends it to the API. No reliance on server-held state.
- Pros: deterministic, easy to debug/replay, tool outputs always included explicitly.
Stateful (opt-in):
- When enabled (use_response_state), AG2 tracks and sends previous_response_id so OpenAI threads the conversation server-side.
- Still send tool results and intermediate messages in messages to avoid divergence between local and server state.
- Provide reset_state() to start new threads; log thread IDs for traceability.

Planning note: Migration and implementation must account for both modes. Stateless remains default for reproducibility; stateful is opt-in for advanced use cases. GroupChat agents should not share thread state. Privacy implications should be documented for stateful mode.

🔗 References

Aug 13 '25 17:08 elCaptnCode

@randombet review this later.

Aug 13 '25 17:08 randombet

Note: Response API has two modes: stateful and stateless. Need to consider this in planning.

Aug 13 '25 17:08 sonichi

Thanks @sonichi. I’ve updated our migration issue to include consideration of the Responses API’s two modes—stateful (server-stored conversation state) and stateless (independent calls). The plan now covers how AG2 can support both and what that implies for context handling and backward compatibility.

Also, I am fixing up the links.

Aug 13 '25 17:08 elCaptnCode

A thin compatibility/normalization layer that makes Responses outputs look like today’s Chat-shape where the internal code expects it.

At first look it seems as the best choice for me. But, I should investigate the problem much deeper to make a final decision

Aug 13 '25 18:08 Lancetnik

Hello @sonichi sir,

The Responses API in AG2 is far more complete than the documentation suggests—core features like run(), two-agent chats, built-in tools, and message normalization already work in production. The main blocker is a small GroupChat bug, making this a short, targeted fix plus documentation update rather than a long migration project. Outdated docs are the biggest adoption barrier.

Aug 14 '25 15:08 tejas-dharani

If this is the case, I would be more than happy to adjust the issue's content. What are your thoughts @Lancetnik @sonichi @qingyun-wu @marklysze? Any suggestions on adjustments.

Aug 14 '25 16:08 elCaptnCode

Thanks @BlocUnited and @tejas-dharani - If we could fix the Group Chat bug and have the LLM tests incorporate testing with the Responses API, to ensure it does indeed work throughout, that would be a good thing to do. I'm sure the Responses API allows for a lot of other things, too.

Aug 14 '25 18:08 marklysze

@BlocUnited @priyansh4320 Could you break down this to multiple sub-issues?

Aug 18 '25 17:08 randombet

@tejas-dharani I sent you a message on discord (captain_). Was hoping we can chat about your suggestions.

Aug 19 '25 19:08 elCaptnCode

@BlocUnited @priyansh4320 Follow up on this issue. We need to define clear objectives and milestones. Current Outcome session is too vague and big. For example, we would say,

Phase 1. Design interfaces to support responsible api for both stateful and stateless sessions. Extensible to support multi-model input and output.

Phase 2 for stateless sessions. Text message for two-agent chat, agent.run(), groupchat

etc.

And in the plan, we need to investigate the current code base and identify blockers. e.g. I believe we may need refactor on groupchat etc to fully support responsible API.

Aug 19 '25 23:08 randombet

Design for Phase 1:

Considering the current implementation. of responsesAPI , which includes a ConfigEntry as 'OpenAIResponsesLLMConfigEntry' which extends OpenAILLMConfigEntry and the generation endpoint for responses client being registered.

I propose a design to manage both Stateful and Stateless implementation, along with multi-model input output support:

Firstly ,Use of Enum for Type Safety , Validation, IDE Support & Autocomplete: A better developer experience, reduce errors, and make the code more maintainable

class ResponseIncludable(str, Enum):
    """Supported include options for Responses API"""
    CODE_INTERPRETER_OUTPUTS = "code_interpreter_call.outputs"
    COMPUTER_CALL_OUTPUT_IMAGE_URL = "computer_call_output.output.image_url"
    FILE_SEARCH_RESULTS = "file_search_call.results"
    MESSAGE_INPUT_IMAGE_URL = "message.input_image.image_url"
    MESSAGE_OUTPUT_TEXT_LOGPROBS = "message.output_text.logprobs"
    REASONING_ENCRYPTED_CONTENT = "reasoning.encrypted_content"

class ServiceTier(str, Enum):
    """Service tier options for Responses API"""
    AUTO = "auto"
    DEFAULT = "default"
    FLEX = "flex"
    SCALE = "scale"
    PRIORITY = "priority"

class VerbosityLevel(str, Enum):
    """Verbosity level options"""
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"

class ReasoningConfig(BaseModel):
    """Configuration for reasoning models (o-series)"""
    effort: Literal["low", "minimal", "medium", "high"]
    model: Optional[str] = None

Core configuration interface for Responses API: Protocol interface provides the benefits of type safety and documentation while maintaining maximum flexibility for different implementation approaches also adds Backward compatible

class ResponsesAPIConfig(Protocol):
    """Core configuration interface for Responses API"""
    
    # Required parameters
    model: str
    input: Union[str, List[Dict[str, Any]]]

 # Include options
    include: Optional[List[ResponseIncludable]] = None

# Reasoning (o-series models)
    reasoning: Optional[ReasoningConfig] = None
    
    # Service configuration
    service_tier: Optional[ServiceTier] = ServiceTier.AUTO
    verbosity: Optional[VerbosityLevel] = None

Enhance OpenAIResponsesLLMConfigEntry to Implement ResponsesAPIConfig via a get_attr ducktyping bridge:

class ResponsesAPIConfig(Protocol):
    """Protocol defines what attributes are required"""
    model: str
    input: Union[str, List[Dict[str, Any]]]
    store: Optional[bool] = True
    tools: Optional[List[Dict[str, Any]]] = None

class OpenAIResponsesLLMConfigEntry(OpenAILLMConfigEntry):
    api_type: Literal["responses"] = "responses"
    
    # Store Responses API specific fields with prefix
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # Store Responses API fields with prefix
        self.responses_model = kwargs.get("model")
        self.responses_input = kwargs.get("input") 
        self.responses_store = kwargs.get("store", True)
        self.responses_tools = kwargs.get("tools")
    
    def __getattr__(self, name: str) -> Any:
        """Duck typing: Forward ResponsesAPIConfig attributes"""
        # When someone asks for config.model, look for self.responses_model
        if hasattr(self, f"responses_{name}"):
            return getattr(self, f"responses_{name}")
        raise AttributeError(f"'{self.__class__.__name__}' has no attribute '{name}'")

# Usage example:
config = OpenAIResponsesLLMConfigEntry(
    model="gpt-4o",
    input="Hello",
    store=True
)

# ✅ Duck typing in action:
print(config.model)  # Calls __getattr__("model") → returns self.responses_model
print(config.input)  # Calls __getattr__("input") → returns self.responses_input  
print(config.store)  # Calls __getattr__("store") → returns self.responses_store

# ✅ Type checker sees this as valid ResponsesAPIConfig
def process_config(config: ResponsesAPIConfig):
    print(config.model)  # ✅ Works - config has .model attribute

process_config(config)  # ✅ Works - config behaves like ResponsesAPIConfig

Visual Representation

ResponsesAPIConfig Protocol:
├── model: str
├── input: Union[str, List[Dict]]
└── store: Optional[bool].      ## to manage Statefull and Stateless calls  

OpenAIResponsesLLMConfigEntry:
├── Inherits from OpenAILLMConfigEntry
├── Has responses_model, responses_input, responses_store
└── __getattr__ bridges the gap:
    config.model → __getattr__("model") → self.responses_model
    config.input → __getattr__("input") → self.responses_input
    config.store → __getattr__("store") → self.responses_store

Implement stateful conversation management.

# Create the state management class
from typing import Optional, Dict, Any, List
from dataclasses import dataclass, field
from datetime import datetime
import uuid

@dataclass
class ConversationState:
    """Represents the state of a conversation session"""
    session_id: str = field(default_factory=lambda: str(uuid.uuid4()))
    previous_response_id: Optional[str] = None
    conversation_history: List[Dict[str, Any]] = field(default_factory=list)
    created_at: datetime = field(default_factory=datetime.utcnow)
    last_updated: datetime = field(default_factory=datetime.utcnow)
    reasoning_items: List[Dict[str, Any]] = field(default_factory=list)
    encrypted_reasoning: Optional[str] = None
    tool_execution_history: List[Dict[str, Any]] = field(default_factory=list)

class ResponsesAPIState:
    """Manages state for Responses API conversations"""
    def __init__(self):
        self._conversations: Dict[str, ConversationState] = {}
        self._default_session_id: Optional[str] = None
    
    # ... (all the methods as defined in the interface)

Enhance OpenAIResponsesClient with State Management

# autogen/oai/openai_responses.py - modify existing class
from .responses_state import ResponsesAPIState

class OpenAIResponsesClient:
    def __init__(self, client: "OpenAI", response_format=None):
        self._oai_client = client
        self.response_format = response_format
        self.state_manager = ResponsesAPIState()  # Add state management
        self.previous_response_id = None
        
    def create(self, params: dict[str, Any]) -> "Response":
        # Get or create session
        session_id = params.get("session_id")
        if session_id:
            session = self.state_manager.get_session(session_id)
        else:
            session_id = self.state_manager.create_session()
            session = self.state_manager.get_session(session_id)
        
        # Add previous_response_id if available
        if session.previous_response_id and "previous_response_id" not in params:
            params["previous_response_id"] = session.previous_response_id

6). Message Format Handling :

enhance message_retrival()/create() method to handle message formats for conflicting params e.g.,

 # Handle built_in_tools parameter
    built_in_tools = params.pop("built_in_tools", [])
    if built_in_tools:
        tools_list = []
        for tool in built_in_tools:
            if tool == "web_search":
                tools_list.append({"type": "web_search_preview"})
            elif tool == "image_generation":
                tools_list.append({"type": "image_generation"})
            elif tool == "file_search":
                tools_list.append({"type": "file_search"})
            elif tool == "computer_use":
                tools_list.append({"type": "computer_use"})
            elif tool == "code_interpreter":
                tools_list.append({"type": "code_interpreter"})
        params["tools"] = tools_list
        params["tool_choice"] = "auto"
    ```

7)  Add Multimodal Input Support
Add a methods to handle and format multi-model input and outputs.

```python
# Add to OpenAIResponsesClient
def _handle_multimodal_input(self, content: Union[str, List[Dict[str, Any]]]) -> List[Dict[str, Any]]:
    """Handle multimodal input (text + images)"""
    if isinstance(content, str):
        return [{"type": "input_text", "text": content}]
    
    blocks = []
    for item in content:
        if item.get("type") == "text":
            blocks.append({"type": "input_text", "text": item.get("text", "")})
        elif item.get("type") == "image_url":
            blocks.append({"type": "input_image", "image_url": item.get("image_url", {}).get("url", "")})
        elif item.get("type") == "image":
            # Handle base64 image data
            image_data = item.get("image")
            if isinstance(image_data, bytes):
                import base64
                base64_data = base64.b64encode(image_data).decode('utf-8')
                blocks.append({
                    "type": "input_image", 
                    "image_url": f"data:image/jpeg;base64,{base64_data}"
                })
    
    return blocks

implement extra_body forbids.

BOTH REQUIREMENTS SATISFIED

Stateful and Stateless Sessions:
store=True/False controls session type
previous_response_id for stateful conversations
include: ["reasoning.encrypted_content"] for stateless with reasoning
ResponsesAPIState manages both modes
Extensible Multimodal Input/Output:
input field supports text, images, audio (future)
built_in_tools for image generation, web search, etc.
tools field for custom function calling
Extensible interface for new modalities

@sonichi @randombet @marklysze @qingyun-wu @Lancetnik , please review this and correct me wherever I am making mistake , this is the approach I want to follow for migrating to responses API. Also, there is additional code refactoring in groupchats.py and support for run method. there are few deprecations i am considering , but ignoring for now for backward compatible code. and not introducing breaking change. although this is kind of a breaking change :) can you please advise me in this plan please, I have put efforts here and your opinion will be very valuable to me.

Aug 20 '25 17:08 priyansh4320

@priyansh4320 I am not sure about responses_model and response_input. Should it be config entry or agent - level option?

Aug 20 '25 18:08 Lancetnik

@Lancetnik your refactor llm_config.where() should help my refactor satisfy both config level, that is my point of view.

Aug 20 '25 20:08 priyansh4320

@Lancetnik your refactor llm_config.where() should help my refactor satisfy both config level, that is my point of view.

I still haven't created an agent-level configuration 😢 The priority has changed, but it's still on the plan.

Aug 20 '25 20:08 Lancetnik

i would love to collaborate with you on that

Aug 20 '25 20:08 priyansh4320

[Feature Request]: Make AG2 primary and fully supported with OpenAI Responses API

📝 Migration: OpenAI Responses API

Why migrate?

📚 Key References

🎯 Outcomes

🗂️ AG2-specific context

✅ Migration Checklist

1. Configuration (opt-in)

2. Routing (no public API changes)

3. Message flow & normalization (stateless default)

4. Stateless vs stateful handling (and impacts)

5. Streaming

6. Tools

7. GroupChat & run surfaces

8. Usage & cost

9. Documentation

🗃️ File-by-file guidance

ℹ️ Notes

🔄 Responses API: Stateless and Stateful Modes

🔗 References

Design for Phase 1: