[Feature Request]: Make AG2 primary and fully supported with OpenAI Responses API
π Migration: OpenAI Responses API
Why migrate?
OpenAIβs Responses API is the new, recommended interface. It supersedes classic Chat Completions with built-in tools (web/file search, computer use), structured state, and event-streaming.
AG2 already exposes OpenAI via LLMConfig(api_type=...), and Responses is supported but currently limited (mainly two-agent initiate_chat). We want full coverage without breaking existing api_type="openai" apps.
π Key References
π― Outcomes
- Full AG2 feature parity with
api_type="responses"(two-agent, GroupChat, tools/function calling, streaming, multimodal) - No application-layer breaks: existing
api_type="openai"(Chat Completions) continues to work; Responses is a config-only opt-in - Compatibility layer: Responses outputs normalized to legacy Chat-shape for internal code
ποΈ AG2-specific context
Click to expand AG2 code touchpoints
autogen/agentchat/conversable_agent.pygenerate_reply(...): Orchestrates: termination/human β tool_calls β code_exec β LLM reply_generate_oai_reply_from_client(...): Flattens tool responses intomessages, calls wrappercreate(...), normalizes viaextract_text_or_completion_object(...)
autogen/agentchat/groupchat.pyGroupChatManager.run_chat/a_run_chat: Triggersspeaker.generate_reply(...), emitsGroupChatRunChatEventandTerminationEventvia IO streams
autogen/oai/client.py(OpenAIWrapper)- Routes by
api_type; attachesresponse.message_retrieval_function = client.message_retrieval, normalizes outputs
- Routes by
autogen/oai/openai_responses.pyOpenAIResponsesClient.create(...): Converts legacymessagesβ Responsesinputblocks, wires built-in tools, handlesresponse_formatvsstream, tracksprevious_response_idmessage_retrieval(...): Adapts Responses outputs to legacy ChatCompletion-like assistant message
autogen/oai/openai_utils.py: Builds config list entries (supportsapi_type)autogen/oai/client_utils.py: Validates params and tool visibility
β Migration Checklist
1. Configuration (opt-in)
- [ ] OAI_CONFIG_LIST supports
api_type: "responses"per entry - [ ] Document optional fields:
built_in_tools(e.g.,image_generation,web_search),tool_choice
2. Routing (no public API changes)
- [ ] Verify
OpenAIWrapperdispatchesapi_type: responsesβOpenAIResponsesClient
3. Message flow & normalization (stateless default)
- [ ] Build full
messagescontext locally in agents - [ ] In
OpenAIResponsesClient.create, convertmessagesβ Responsesinputblocks - [ ] Ensure
message_retrievalreturns a single assistant message withcontentand optionaltool_calls(legacy shape)
4. Stateless vs stateful handling (and impacts)
- [ ] Default: stateless (send full context; no server thread state)
- [ ] Add optional stateful threading:
use_response_state(default false) onOpenAIResponsesLLMConfigEntry - [ ] When stateful enabled:
- Client maintains
previous_response_idacross turns - Still include tool outputs in
messagesto avoid divergence - Provide
reset_state()to start new thread; log thread IDs at debug - GroupChat: do not share thread state across agents
- Privacy: note server-side retention in docs for sensitive workloads
- Client maintains
5. Streaming
- [ ] Preserve existing stream events (no API change)
- [ ] If
response_formatpresent, dropstream(Responses restriction) and log warning
6. Tools
- [ ] Map
built_in_toolsto Responses scopes; account for image costs - [ ] External tool calls: keep schema/flow unchanged; normalize tool call names/args
7. GroupChat & run surfaces
- [ ] No changes to
GroupChatManager.run_chat/a_run_chat; event emissions unchanged - [ ] Ensure normalized assistant message continues to drive GroupChat flows
8. Usage & cost
- [ ] Populate token usage/model fields; aggregate image costs
9. Documentation
- [ ] README: add βUsing Responses APIβ with config example and notes
- Default: Chat Completions; Responses is opt-in
- Streaming +
response_formatare mutually exclusive - Optional stateful mode via
use_response_stateandreset_state()
ποΈ File-by-file guidance
Expand for implementation details
- autogen/oai/openai_responses.py
- In
OpenAIResponsesLLMConfigEntry: adduse_response_state: bool = False - In
OpenAIResponsesClient.__init__: initializeself._previous_response_id: str | None = None - In
create(...):- Convert
messagestoinputblocks (stateless default) - If
config.use_response_stateandself._previous_response_idis set, includeprevious_response_idin request - After response, if new
response.idavailable, setself._previous_response_id = response.id - If
response_formatpresent, removestreamparam and log
- Convert
- Add
def reset_state(self) -> None:to clearself._previous_response_id - Ensure
message_retrieval(...)returns single assistant message dict withcontentand optionaltool_calls
- In
- autogen/oai/client.py
- No API change. Optionally expose passthrough to
reset_state()on active client
- No API change. Optionally expose passthrough to
- autogen/agentchat/conversable_agent.py
- No behavior change. Keep stateless construction of
messages; tool responses flattened before client call
- No behavior change. Keep stateless construction of
- autogen/agentchat/groupchat.py
- No behavior change. Events (
GroupChatRunChatEvent,TerminationEvent) remain as-is
- No behavior change. Events (
βΉοΈ Notes
- Streaming and
response_formatare mutually exclusive per Responses API - Stateful mode is advanced/opt-in; default is stateless for reproducibility
- Multimodal inputs must map to Responses input blocks; normalized assistant output remains text
π Responses API: Stateless and Stateful Modes
OpenAI's Responses API supports both stateless and stateful operation. AG2 must plan for and support both:
-
Stateless (default):
- AG2 builds the complete
messagescontext locally per turn and sends it to the API. No reliance on server-held state. - Pros: deterministic, easy to debug/replay, tool outputs always included explicitly.
- AG2 builds the complete
-
Stateful (opt-in):
- When enabled (
use_response_state), AG2 tracks and sendsprevious_response_idso OpenAI threads the conversation server-side. - Still send tool results and intermediate messages in
messagesto avoid divergence between local and server state. - Provide
reset_state()to start new threads; log thread IDs for traceability.
- When enabled (
Planning note: Migration and implementation must account for both modes. Stateless remains default for reproducibility; stateful is opt-in for advanced use cases. GroupChat agents should not share thread state. Privacy implications should be documented for stateful mode.
π References
@randombet review this later.
Note: Response API has two modes: stateful and stateless. Need to consider this in planning.
Thanks @sonichi. Iβve updated our migration issue to include consideration of the Responses APIβs two modesβstateful (server-stored conversation state) and stateless (independent calls). The plan now covers how AG2 can support both and what that implies for context handling and backward compatibility.
Also, I am fixing up the links.
A thin compatibility/normalization layer that makes Responses outputs look like todayβs Chat-shape where the internal code expects it.
At first look it seems as the best choice for me. But, I should investigate the problem much deeper to make a final decision
Hello @sonichi sir,
The Responses API in AG2 is far more complete than the documentation suggestsβcore features like run(), two-agent chats, built-in tools, and message normalization already work in production. The main blocker is a small GroupChat bug, making this a short, targeted fix plus documentation update rather than a long migration project. Outdated docs are the biggest adoption barrier.
If this is the case, I would be more than happy to adjust the issue's content. What are your thoughts @Lancetnik @sonichi @qingyun-wu @marklysze? Any suggestions on adjustments.
Thanks @BlocUnited and @tejas-dharani - If we could fix the Group Chat bug and have the LLM tests incorporate testing with the Responses API, to ensure it does indeed work throughout, that would be a good thing to do. I'm sure the Responses API allows for a lot of other things, too.
@BlocUnited @priyansh4320 Could you break down this to multiple sub-issues?
@tejas-dharani I sent you a message on discord (captain_). Was hoping we can chat about your suggestions.
@BlocUnited @priyansh4320 Follow up on this issue. We need to define clear objectives and milestones. Current Outcome session is too vague and big. For example, we would say,
Phase 1. Design interfaces to support responsible api for both stateful and stateless sessions. Extensible to support multi-model input and output.
Phase 2 for stateless sessions. Text message for two-agent chat, agent.run(), groupchat
etc.
And in the plan, we need to investigate the current code base and identify blockers. e.g. I believe we may need refactor on groupchat etc to fully support responsible API.
Design for Phase 1:
Considering the current implementation. of responsesAPI , which includes a ConfigEntry as 'OpenAIResponsesLLMConfigEntry' which extends OpenAILLMConfigEntry and the generation endpoint for responses client being registered.
I propose a design to manage both Stateful and Stateless implementation, along with multi-model input output support:
- Firstly ,Use of
Enumfor Type Safety , Validation, IDE Support & Autocomplete: A better developer experience, reduce errors, and make the code more maintainable
class ResponseIncludable(str, Enum):
"""Supported include options for Responses API"""
CODE_INTERPRETER_OUTPUTS = "code_interpreter_call.outputs"
COMPUTER_CALL_OUTPUT_IMAGE_URL = "computer_call_output.output.image_url"
FILE_SEARCH_RESULTS = "file_search_call.results"
MESSAGE_INPUT_IMAGE_URL = "message.input_image.image_url"
MESSAGE_OUTPUT_TEXT_LOGPROBS = "message.output_text.logprobs"
REASONING_ENCRYPTED_CONTENT = "reasoning.encrypted_content"
class ServiceTier(str, Enum):
"""Service tier options for Responses API"""
AUTO = "auto"
DEFAULT = "default"
FLEX = "flex"
SCALE = "scale"
PRIORITY = "priority"
class VerbosityLevel(str, Enum):
"""Verbosity level options"""
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
class ReasoningConfig(BaseModel):
"""Configuration for reasoning models (o-series)"""
effort: Literal["low", "minimal", "medium", "high"]
model: Optional[str] = None
- Core configuration interface for Responses API: Protocol interface provides the benefits of type safety and documentation while maintaining maximum flexibility for different implementation approaches also adds Backward compatible
class ResponsesAPIConfig(Protocol):
"""Core configuration interface for Responses API"""
# Required parameters
model: str
input: Union[str, List[Dict[str, Any]]]
# Include options
include: Optional[List[ResponseIncludable]] = None
# Reasoning (o-series models)
reasoning: Optional[ReasoningConfig] = None
# Service configuration
service_tier: Optional[ServiceTier] = ServiceTier.AUTO
verbosity: Optional[VerbosityLevel] = None
- Enhance OpenAIResponsesLLMConfigEntry to Implement ResponsesAPIConfig via a get_attr ducktyping bridge:
class ResponsesAPIConfig(Protocol):
"""Protocol defines what attributes are required"""
model: str
input: Union[str, List[Dict[str, Any]]]
store: Optional[bool] = True
tools: Optional[List[Dict[str, Any]]] = None
class OpenAIResponsesLLMConfigEntry(OpenAILLMConfigEntry):
api_type: Literal["responses"] = "responses"
# Store Responses API specific fields with prefix
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Store Responses API fields with prefix
self.responses_model = kwargs.get("model")
self.responses_input = kwargs.get("input")
self.responses_store = kwargs.get("store", True)
self.responses_tools = kwargs.get("tools")
def __getattr__(self, name: str) -> Any:
"""Duck typing: Forward ResponsesAPIConfig attributes"""
# When someone asks for config.model, look for self.responses_model
if hasattr(self, f"responses_{name}"):
return getattr(self, f"responses_{name}")
raise AttributeError(f"'{self.__class__.__name__}' has no attribute '{name}'")
# Usage example:
config = OpenAIResponsesLLMConfigEntry(
model="gpt-4o",
input="Hello",
store=True
)
# β
Duck typing in action:
print(config.model) # Calls __getattr__("model") β returns self.responses_model
print(config.input) # Calls __getattr__("input") β returns self.responses_input
print(config.store) # Calls __getattr__("store") β returns self.responses_store
# β
Type checker sees this as valid ResponsesAPIConfig
def process_config(config: ResponsesAPIConfig):
print(config.model) # β
Works - config has .model attribute
process_config(config) # β
Works - config behaves like ResponsesAPIConfig
Visual Representation
ResponsesAPIConfig Protocol:
βββ model: str
βββ input: Union[str, List[Dict]]
βββ store: Optional[bool]. ## to manage Statefull and Stateless calls
OpenAIResponsesLLMConfigEntry:
βββ Inherits from OpenAILLMConfigEntry
βββ Has responses_model, responses_input, responses_store
βββ __getattr__ bridges the gap:
config.model β __getattr__("model") β self.responses_model
config.input β __getattr__("input") β self.responses_input
config.store β __getattr__("store") β self.responses_store
- Implement stateful conversation management.
# Create the state management class
from typing import Optional, Dict, Any, List
from dataclasses import dataclass, field
from datetime import datetime
import uuid
@dataclass
class ConversationState:
"""Represents the state of a conversation session"""
session_id: str = field(default_factory=lambda: str(uuid.uuid4()))
previous_response_id: Optional[str] = None
conversation_history: List[Dict[str, Any]] = field(default_factory=list)
created_at: datetime = field(default_factory=datetime.utcnow)
last_updated: datetime = field(default_factory=datetime.utcnow)
reasoning_items: List[Dict[str, Any]] = field(default_factory=list)
encrypted_reasoning: Optional[str] = None
tool_execution_history: List[Dict[str, Any]] = field(default_factory=list)
class ResponsesAPIState:
"""Manages state for Responses API conversations"""
def __init__(self):
self._conversations: Dict[str, ConversationState] = {}
self._default_session_id: Optional[str] = None
# ... (all the methods as defined in the interface)
- Enhance OpenAIResponsesClient with State Management
# autogen/oai/openai_responses.py - modify existing class
from .responses_state import ResponsesAPIState
class OpenAIResponsesClient:
def __init__(self, client: "OpenAI", response_format=None):
self._oai_client = client
self.response_format = response_format
self.state_manager = ResponsesAPIState() # Add state management
self.previous_response_id = None
def create(self, params: dict[str, Any]) -> "Response":
# Get or create session
session_id = params.get("session_id")
if session_id:
session = self.state_manager.get_session(session_id)
else:
session_id = self.state_manager.create_session()
session = self.state_manager.get_session(session_id)
# Add previous_response_id if available
if session.previous_response_id and "previous_response_id" not in params:
params["previous_response_id"] = session.previous_response_id
6). Message Format Handling :
- enhance message_retrival()/create() method to handle message formats for conflicting params e.g.,
# Handle built_in_tools parameter
built_in_tools = params.pop("built_in_tools", [])
if built_in_tools:
tools_list = []
for tool in built_in_tools:
if tool == "web_search":
tools_list.append({"type": "web_search_preview"})
elif tool == "image_generation":
tools_list.append({"type": "image_generation"})
elif tool == "file_search":
tools_list.append({"type": "file_search"})
elif tool == "computer_use":
tools_list.append({"type": "computer_use"})
elif tool == "code_interpreter":
tools_list.append({"type": "code_interpreter"})
params["tools"] = tools_list
params["tool_choice"] = "auto"
```
7) Add Multimodal Input Support
Add a methods to handle and format multi-model input and outputs.
```python
# Add to OpenAIResponsesClient
def _handle_multimodal_input(self, content: Union[str, List[Dict[str, Any]]]) -> List[Dict[str, Any]]:
"""Handle multimodal input (text + images)"""
if isinstance(content, str):
return [{"type": "input_text", "text": content}]
blocks = []
for item in content:
if item.get("type") == "text":
blocks.append({"type": "input_text", "text": item.get("text", "")})
elif item.get("type") == "image_url":
blocks.append({"type": "input_image", "image_url": item.get("image_url", {}).get("url", "")})
elif item.get("type") == "image":
# Handle base64 image data
image_data = item.get("image")
if isinstance(image_data, bytes):
import base64
base64_data = base64.b64encode(image_data).decode('utf-8')
blocks.append({
"type": "input_image",
"image_url": f"data:image/jpeg;base64,{base64_data}"
})
return blocks
- implement extra_body forbids.
BOTH REQUIREMENTS SATISFIED
- Stateful and Stateless Sessions:
- store=True/False controls session type
- previous_response_id for stateful conversations
- include: ["reasoning.encrypted_content"] for stateless with reasoning
- ResponsesAPIState manages both modes
- Extensible Multimodal Input/Output:
- input field supports text, images, audio (future)
- built_in_tools for image generation, web search, etc.
- tools field for custom function calling
- Extensible interface for new modalities
@sonichi @randombet @marklysze @qingyun-wu @Lancetnik , please review this and correct me wherever I am making mistake , this is the approach I want to follow for migrating to responses API. Also, there is additional code refactoring in groupchats.py and support for run method. there are few deprecations i am considering , but ignoring for now for backward compatible code. and not introducing breaking change. although this is kind of a breaking change :) can you please advise me in this plan please, I have put efforts here and your opinion will be very valuable to me.
@priyansh4320 I am not sure about responses_model and response_input. Should it be config entry or agent - level option?
@Lancetnik your refactor llm_config.where() should help my refactor satisfy both config level, that is my point of view.
@Lancetnik your refactor
llm_config.where()should help my refactor satisfy both config level, that is my point of view.
I still haven't created an agent-level configuration π’ The priority has changed, but it's still on the plan.
i would love to collaborate with you on that