refactor: Enforce consistent list[messages] API across agent messaging
Why are these changes needed?
Supports:
-
conversation API https://platform.openai.com/docs/api-reference/conversations/create-items is where OAI support list input.
-
This is also used for response API https://platform.openai.com/docs/api-reference/responses/create#responses_create-conversation
And response API is going to replace chat completion API and all new features will be released through it.
-
also adds support for threads API https://platform.openai.com/docs/api-reference/threads/createThread.
-
This PR standardizes all agent messaging to a single, predictable list[dict] shape end-to-end, which:
- Reduces branching and simplifies code paths
- Makes hook processing straightforward (hooks always receive normalized dicts)
- Improves support for multimodal content and tool calls
- Provides a cleaner foundation for future messaging enhancements Changes Made Type System Updates:
- Updated type hints for send, a_send, receive, a_receive, initiate_chat, a_initiate_chat, generate_reply, and a_generate_reply to accept/return list[dict[str, Any]]
- Updated Agent protocol signatures to match implementation Runtime Compatibility:
- Added centralized _normalize_message_to_list() for backward compatibility
- Runtime silently accepts legacy str | dict formats and normalizes them All existing code continues to work without modification Bug Fixes:
- Fixed infinite loop issue in nested chats Added test isolation helpers to prevent cross-test contamination from handoff state Testing:
- Added comprehensive backward compatibility test suites (626 lines sync, 655 lines async)
- Verified compatibility with groupchats, handoffs, tool calling, patterns, groupchat managers, handoff conditions, and nested chats
- Updated experimental hook implementations (reliable.py) to handle normalized format Backward Compatibility
- Existing code using agent.send({"content": "..."}) continues to work via runtime normalization. Type hints guide new code toward the canonical format while runtime remains flexible.
Related issue number
closes #2123 closes #2124
Checks
- [x] I've included any doc changes needed for https://docs.ag2.ai/. See https://docs.ag2.ai/latest/docs/contributor-guide/documentation/ to build and test documentation locally.
- [x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
- [x] I've made sure all auto checks have passed.
π Documentation Analysis
All docs are up to date! π
β Latest commit analyzed: a53a5364cf6b1cf4d37d6d726ef6cd36aa149420 | Powered by Joggr
Great work! However, we should standardize the data and use
messages: list[dict[str, Any]]instead ofmessages: list[dict[str, Any] | dict[str, Any]].So we need to refactor the current logic to return a[reply]instead ofreplyto unify all usage and work with lists consistently.
thanks , will fix this
PR Review: Enforce consistent list[messages] API across agent messaging
Summary
This PR aims to standardize agent messaging to use list[dict[str, Any]] throughout the codebase. While the intent is valuable for consistency, there are several critical issues that need to be addressed before merging.
π΄ Critical Issues
1. Type Signature Inconsistency - Breaking Change
Location: autogen/agentchat/agent.py:43, 59, 75, 91
The send() and a_send() method signatures changed from dict[str, Any] | str to list[dict[str, Any]] | str, but the receive() and a_receive() methods now only accept list[dict[str, Any]] (removed str and dict support).
# agent.py:43 - send accepts str
def send(self, message: list[dict[str, Any]] | str, ...
# agent.py:75 - receive does NOT accept str or dict anymore
def receive(self, message: list[dict[str, Any]], ...
Problem: This creates an asymmetry where:
send("hello")is valid (string accepted)- But internally it must call
receive([...])with a list
Impact: This is a breaking API change. Existing code calling agent.receive(message_dict, sender) or agent.receive("text", sender) will break.
2. Backward Compatibility Broken
Location: autogen/agentchat/conversable_agent.py:1288, 1328
The type hints show receive() only accepts list[dict[str, Any]] but the implementation tries to handle single messages:
def receive(self, message: list[dict[str, Any]], sender, ...):
# Line 1317: checks if isinstance(message, list)
# But type hint says it MUST be a list already
Problems:
- Type checkers will reject
agent.receive({"content": "hi", "role": "user"}, sender) - Documentation says one thing, implementation does another
- The conversion logic at line 1317-1318 will never trigger because type system expects a list
3. Inconsistent a_receive Signature
Location: autogen/agentchat/conversable_agent.py:1328
async def a_receive(
self,
message: dict[str, Any] | list[dict[str, Any]] | str, # Accepts all types
sender: Agent,
...
)
But the sync version receive() at line 1288 only accepts list[dict[str, Any]]. This inconsistency will confuse users.
4. _process_message_before_send Logic Flaw
Location: autogen/agentchat/conversable_agent.py:1109-1152
The method signature says input is list[dict[str, Any]] but then has an else branch (lines 1129-1152) handling non-list inputs. This code is unreachable given the type signature.
def _process_message_before_send(
self, message: list[dict[str, Any]], recipient, silent
) -> dict[str, Any] | str | list[dict[str, Any]]:
# ...
if isinstance(message, list): # Always true per type hint
# ...
else: # UNREACHABLE CODE
# Lines 1130-1152: converts str/dict to list
Root Cause: The type hint is wrong. Should be list[dict[str, Any]] | str | dict[str, Any] to match actual logic.
5. send() Method Type Mismatch
Location: autogen/agentchat/conversable_agent.py:1154-1206
def send(self, message: str | list[dict[str, Any]], ...):
# Line 1192: converts string to list
if isinstance(message, str):
message = [message] # Now it's list[str], not list[dict]!
# Line 1195: calls _process_message_before_send
processed_msgs = self._process_message_before_send(message, ...)
# But _process_message_before_send expects list[dict[str, Any]]
# Not list[str]!
Problem: When message="hello", line 1192 converts it to ["hello"] (list of strings), but _process_message_before_send expects list[dict]. Type error!
6. Missing Empty List Check in Tool Reply Methods
Location: autogen/agentchat/conversable_agent.py:2457, 2487, 2517, 2613
The PR adds empty message checks:
if not messages:
return False, None
Good: Prevents crashes on empty lists.
Concern: Silently returning False, None might hide bugs. Consider logging a warning or being explicit about this edge case in documentation.
π‘ Medium Priority Issues
7. Performance: List Conversion Overhead
Location: Multiple places converting single messages to lists
Every send("text") or send(dict) now gets wrapped in a list, processed, then unwrapped. For high-frequency messaging scenarios, this could add overhead.
Suggestion: Consider lazy conversion or maintain both paths internally.
8. Hook Processing Inconsistency
Location: autogen/agentchat/conversable_agent.py:1115-1128
When processing a list of messages, each message is processed through ALL hooks:
for msg in message:
processed_msg = msg
for hook in hook_list:
processed_msg = hook(...)
But hooks receive individual messages, not the list. This could break hooks that expect context about message ordering or batching.
9. Tool Call ID Optional Handling
Location: autogen/agentchat/conversable_agent.py:2570
async def _a_execute_tool_call(self, tool_call):
tool_call_id = tool_call.get("id", None)
# ...
if tool_call_id is not None:
return {"tool_call_id": tool_call_id, ...}
else:
return {"role": "tool", ...} # No tool_call_id
Good: Handles Mistral API compatibility.
Concern: The sync version _execute_tool_call (if it exists) may not have this logic. Ensure parity.
10. Documentation Incomplete
Location: autogen/agentchat/agent.py:50, 65
Doc strings say:
- Line 50: "If a list of messages, it should be..."
- Line 65: "If a list[messages] or str..."
But actual parameter is list[dict[str, Any]] | str. The docs don't explain what happens when you pass a string. Does it get auto-converted to [{"content": str, "role": "user"}]?
π’ Positive Aspects
-
Comprehensive Test Coverage: The addition of
test_backward_compatibility_sync.pyandtest_backward_compatibility_async.pywith 1255 new test lines is excellent. -
RandomPattern Bug Fix: Moving handoff generation before
super().prepare_group_chat()(lines 71-75 inrandom.py) fixes a validation issue. Smart catch! -
Tool Registry Update: Line 99-100 in
autogen/tools/tool.pyensures tools are added toagent._toolslist. Good for consistency. -
Null Content Handling: Line 2583 in
conversable_agent.pyhandlesNonecontent gracefully. -
Generate Reply Hook Enhancement: Lines 2954-2961 properly wrap single messages in lists before passing to hooks, maintaining backward compatibility at the hook level.
π Security Considerations
Low Risk: The changes primarily affect internal data flow, not external inputs. However:
- Ensure hook functions can't be exploited by passing malformed message lists
- The tool input/output processing hooks (lines 2574-2578) need security review to ensure they can't be bypassed
π― Recommendations
Must Fix Before Merge:
-
Decide on the API contract: Either:
- Option A:
send()andreceive()both acceptstr | dict | list[dict](full backward compatibility) - Option B: Both accept only
list[dict](breaking change, requires major version bump) - Option C:
send()accepts any format,receive()only accepts lists (current state, but needs clear docs)
- Option A:
-
Fix type hints to match actual implementation throughout
-
Synchronize sync and async methods:
receive()anda_receive()should have identical signatures -
Fix the
send()string conversion at line 1192 to create proper dict format:[{"content": message, "role": "user"}] -
Update all docstrings to clearly explain the conversion behavior
Nice to Have:
- Add migration guide for users updating from previous versions
- Consider deprecation warnings if maintaining backward compatibility
- Add integration tests covering real-world scenarios (group chats with mixed message types)
- Performance benchmark for high-frequency messaging scenarios
Verdict
Changes Requested - The core idea is sound, but implementation has critical type consistency issues that need resolution before merge. The test coverage is impressive, but the type system mismatches could cause runtime errors that tests might miss.
Great work on tackling this complex refactoring! With the fixes above, this will be a solid improvement to the codebase.
@Lancetnik drafting this, made a mistake while rebasing and want to test my PR once more , based on claude review.
Code Review: PR #2081 - Enforce consistent list[messages] API
Overview
This PR makes a significant architectural change to standardize agent messaging to list[dict[str, Any]] across the entire codebase. This is a substantial refactor (+2033/-254 lines across 23 files) that aims to improve consistency and reduce branching logic.
Strengths
- Clear design intent with _normalize_message_to_list() providing backward compatibility
- Comprehensive test coverage with sync and async backward compatibility tests
- Good documentation updates with clear docstrings
Critical Issues
1. Breaking API Contract (High Priority)
Location: autogen/agentchat/agent.py:40-72
The protocol signatures declare message: list[dict[str, Any]], but the implementation accepts str | dict | list. Type checkers will flag this mismatch.
Recommendation: Update protocol signature to match actual behavior or use overloads.
2. Skipped Tests (High Priority)
Location: test/test_backward_compatibility_sync.py:14, test/test_backward_compatibility_async.py:16
The tests are ALWAYS skipped, even with API key present. They need conditional skip logic.
3. Missing Defensive Checks (Medium Priority)
Location: autogen/agentchat/conversable_agent.py:1207-1220
No validation that processed_msgs is still a list after hook processing. Add checks for None, empty list, and type validation.
4. Missing Methods (High Priority)
Location: autogen/agentchat/conversable_agent.py:2604, 2611
Code calls self._process_tool_input() and self._process_tool_output() but these methods aren't in the diff. Verify they exist.
Recommendations
Must Fix Before Merge:
- Fix skipped backward compatibility tests
- Resolve protocol/implementation type mismatch
- Add defensive checks for hook return values
- Verify missing methods exist
Should Fix: 5. Add explicit None/empty list handling 6. Clarify role assignment logic with comments 7. Add tests for infinite loop prevention 8. Fix tool_call_id handling for API compatibility
Conclusion
Well-intentioned refactor with good practices, but needs another iteration to address critical issues. The approach is sound, execution needs refinement.
Recommendation: Request Changes - Address critical issues before merging.
Great work on tackling this significant refactor!
@claude can you please re-review the requested changes in this review.
Critical Issues
-
Breaking API Contract Location: autogen/agentchat/agent.py:40-72 The protocol signatures declare message: list[dict[str, Any]] (dismissed as not needed)
-
Skipped Tests (High Priority) Location: test/test_backward_compatibility_sync.py:14, test/test_backward_compatibility_async.py:16 The tests are ALWAYS skipped, even with API key present. They need conditional skip logic. (handled with conditional skip)
-
Missing Defensive Checks (Medium Priority) Location: autogen/agentchat/conversable_agent.py:1207-1220 No validation that processed_msgs is still a list after hook processing. Add checks for None, empty list, and type validation. (handled with added checks)
-
Missing Methods (High Priority) Location: autogen/agentchat/conversable_agent.py:2604, 2611 Code calls self._process_tool_input() and self._process_tool_output() but these methods aren't in the diff. Verify they exist. (dismissed as already present in conversable_agent.py _process_tool_input() at line 4009 Called at lines 2579 and 2627 Processes tool inputs through safeguard hooks _process_tool_output() at line 4026 Called at lines 2592 and 2634)
Claude finished @priyansh4320's task ββ View job
Re-review of Critical Issues
I've completed a thorough re-review of the previously identified critical issues. Here's my assessment:
β Issue #2: Skipped Tests - PROPERLY FIXED
Status: Resolved correctly
The conditional skip logic is now properly implemented:
- Both
test_backward_compatibility_sync.pyandtest_backward_compatibility_async.pyuse@pytest.mark.skipif(OPENAI_API_KEY == "None", reason="OpenAI API key not found") - Found 14 test functions in each file with this decorator
- Tests will run when API key is present, skip when absent
- This follows the correct pytest pattern for conditional skipping
Code reference: test/test_backward_compatibility_sync.py:138
β Issue #3: Missing Defensive Checks - PROPERLY FIXED
Status: Resolved correctly
Comprehensive defensive checks have been added to both send() and a_send() methods:
In send() method (lines 1213-1223):
# Validate hooks didn't break the message
if processed_msgs is None:
raise ValueError(
"Message processing hook returned None. Messages must be a non-empty list of dictionaries."
)
if not isinstance(processed_msgs, list):
raise TypeError(f"Message processing hook must return a list, got {type(processed_msgs).__name__}")
if len(processed_msgs) == 0:
return # Allow empty list as no-op
In a_send() method (lines 1281-1289):
Same defensive checks are mirrored in the async version.
Code reference: autogen/agentchat/conversable_agent.py:1213-1223, 1281-1289
β οΈ Issue #1: Breaking API Contract - NEEDS DISCUSSION
Status: Dismissed by author, but warrants discussion
The protocol in agent.py declares:
send(message: list[dict[str, Any]])receive(message: list[dict[str, Any]])
However, the implementation in conversable_agent.py accepts runtime normalization via _normalize_message_to_list() which handles str | dict | list[dict].
My Assessment:
This is intentional design, not a bug. Here's why:
- Type hints guide new code - They encourage the canonical
list[dict]format - Runtime remains flexible -
_normalize_message_to_list()provides backward compatibility - Documentation explains it - Docstrings clearly state: "If a str or dict, will be automatically normalized to list format"
The approach is valid but creates a type system vs runtime mismatch:
- β Pro: Nudges developers toward the new API without breaking existing code
- β Con: Type checkers (mypy, pyright) will flag valid runtime code as errors
- β Con: May confuse developers who rely on type hints
Recommendation: Consider one of these approaches:
-
Add overloads (preferred for type safety):
@overload def send(self, message: str, recipient: Agent, ...) -> None: ... @overload def send(self, message: dict[str, Any], recipient: Agent, ...) -> None: ... @overload def send(self, message: list[dict[str, Any]], recipient: Agent, ...) -> None: ... -
Accept the trade-off - Document that type hints show the preferred API, runtime accepts legacy formats
-
Union type - Use
message: list[dict[str, Any]] | dict[str, Any] | str(less "opinionated")
The current approach is functionally correct but may cause type checker friction for users.
β Issue #4: Missing Methods - VERIFIED
Status: Methods exist, correctly dismissed
Confirmed that both methods exist and are properly called:
-
_process_tool_input()at line 4009- Called at lines 2579 and 2627
- Processes tool inputs through safeguard hooks
-
_process_tool_output()at line 4026- Called at lines 2592 and 2634
- Processes tool outputs through safeguard hooks
Code reference: autogen/agentchat/conversable_agent.py:4009, 4026
Summary
Fixed Issues: 2/4 critical issues properly resolved
- β Issue #2: Skipped Tests
- β Issue #3: Missing Defensive Checks
Verified/Dismissed: 2/4 issues correctly assessed
- β Issue #4: Missing Methods (correctly dismissed - methods exist)
- β οΈ Issue #1: Breaking API Contract (intentional design choice, but warrants discussion)
Overall Assessment
The PR addresses the critical implementation issues effectively. The only remaining consideration is the type system vs runtime mismatch in Issue #1, which is a design trade-off rather than a bug.
My Recommendation:
- If you want strict type safety β Add
@overloaddecorators - If you want to guide developers toward the new API β Current approach is acceptable, but add a note in the migration guide about type checker warnings
The code is production-ready from a functionality standpoint. The type hint question is about developer experience and tooling compatibility.
Job run: https://github.com/ag2ai/ag2/actions/runs/unknown
Branch: update-generate-reply-function
Pull Request Review - PR #2081
Overview
This PR standardizes agent messaging to use a consistent list[dict[str, Any]] format throughout the codebase. This is a significant architectural change that improves consistency and maintainability. Overall, the approach is sound, but there are several important considerations to address.
Code Quality & Best Practices
β Strengths
- Comprehensive backward compatibility: The
_normalize_message_to_list()helper ensures existing code continues to work - Good use of overloads: Protocol definitions use
@overloadto maintain type safety while supporting multiple input formats - Extensive test coverage: 1,312 lines of new backward compatibility tests (sync + async)
- Documentation updates: Updated docstrings to reflect the new API
β οΈ Areas for Improvement
1. Type Safety Concerns (autogen/agentchat/conversable_agent.py:1096-1119)
The _process_message_before_send() method has a problematic type signature:
def _process_message_before_send(
self, message: list[dict[str, Any]], recipient: Agent, silent: bool
) -> dict[str, Any] | str | list[dict[str, Any]]: # β οΈ Return type too broad
Issues:
- Input is
list[dict]but return type includesstr | dict, which is inconsistent - The implementation only returns
list[dict[str, Any]]but the signature suggests otherwise - This could lead to type checker confusion
Recommendation:
def _process_message_before_send(
self, message: list[dict[str, Any]], recipient: Agent, silent: bool
) -> list[dict[str, Any]]: # Consistent with input type
2. Error Handling in send() and a_send() (autogen/agentchat/conversable_agent.py:1173-1200)
The validation logic has edge cases:
if len(processed_msgs) == 0:
return # Allow empty list as no-op
Concerns:
- Silent failures can be hard to debug
- Inconsistent with the error message: "Messages must be a non-empty list"
- Should this be a warning instead of silent success?
Recommendation: Either log a warning or reconsider if empty message lists should be allowed:
if len(processed_msgs) == 0:
warnings.warn("Attempted to send empty message list - no action taken")
return
3. Hook Processing Assumption (autogen/agentchat/conversable_agent.py:1096-1119)
The code assumes hooks always return valid messages:
for hook in hook_list:
processed_msg = hook(
sender=self,
message=processed_msg,
recipient=recipient,
silent=ConversableAgent._is_silent(self, silent),
)
processed_messages.append(processed_msg)
Issue: No validation that hook() returns a dict. If a hook returns None or an invalid type, it will fail later.
Recommendation:
for hook in hook_list:
processed_msg = hook(...)
if not isinstance(processed_msg, dict):
raise TypeError(
f"Hook {hook.__name__} must return dict, got {type(processed_msg).__name__}"
)
Potential Bugs
π Bug 1: generate_function_call_reply() Empty Message Handling
In autogen/agentchat/conversable_agent.py:2506-2507:
if not messages:
return False, None
Issue: This check was added but the function signature hasn't changed to document that messages can be empty. This could cause confusion.
Recommendation: Update docstring or add explicit handling documentation.
π Bug 2: Role Assignment Inconsistency
In conversable_agent.py:1173, messages are normalized with role="user":
message = self._normalize_message_to_list(message, role="user")
But in conversable_agent.py:1336, the reply is normalized with role="assistant":
reply = [{"content": reply, "role": "assistant"}]
Issue: This creates an asymmetry. When an agent sends a message, shouldn't the role be "assistant" (since it's the one composing it)?
Current behavior:
- Agent A sends message β normalized as "user" β sent to Agent B
- Agent B receives β appends to chat history
- Agent B generates reply β normalized as "assistant"
Recommendation: Review the role assignment logic to ensure it aligns with OpenAI's chat completion schema expectations.
Performance Considerations
β‘ List Iteration Overhead
The new implementation iterates over message lists multiple times:
_normalize_message_to_list()- converts to list_process_message_before_send()- iterates for hooks- Validation loop in
send()- validates each message receive()- processes each message individually
Impact: For single messages (the common case), this adds unnecessary overhead.
Recommendation: Consider optimizing the common path (single message) vs. the rare path (multiple messages):
if isinstance(message, str):
# Fast path for single string message
processed = self._process_single_message(message, recipient, silent)
recipient.receive([processed], self, request_reply, silent)
elif isinstance(message, dict):
# Fast path for single dict message
# ...
else:
# Slower path for list of messages
# ... existing logic
Security Concerns
π No Critical Security Issues Identified
However, consider:
- Hook injection risk: Hooks can modify messages arbitrarily. Ensure proper validation of hook return values (see recommendation above)
- Message content validation: No validation that message content is sanitized before processing
Test Coverage
β Excellent Coverage
- 638 lines of sync backward compatibility tests
- 674 lines of async backward compatibility tests
- Tests cover:
- String messages, dict messages, list messages
- Tool calls, groupchats, handoffs
- Nested chats, patterns, conditions
β οΈ Missing Test Cases
- Hook behavior with list messages: No explicit test for how hooks process multi-message lists
- Empty message edge cases: What happens when hooks return empty lists?
- Invalid message format recovery: Test resilience to malformed messages
- Performance regression tests: Ensure single-message case isn't significantly slower
Recommendation:
def test_hook_processes_multiple_messages():
"""Test that hooks receive and can modify each message in a list"""
def my_hook(sender, message, recipient, silent):
message["modified"] = True
return message
agent.register_hook("process_message_before_send", my_hook)
agent.send([{"content": "msg1"}, {"content": "msg2"}], recipient)
# Verify both messages were processed
assert all(msg.get("modified") for msg in recipient.chat_messages[agent])
Documentation
β Good Documentation Updates
- Updated docstrings in
send(),receive(),a_send(),a_receive() - Updated contributor documentation
β οΈ Missing Documentation
- Migration guide: How should users update their code? (even though backward compatible, new code should use best practices)
- Hook contract: Document that hooks must handle dict messages (not lists) and must return dicts
- Performance implications: Document when to use single vs. multiple messages
Recommendations Summary
High Priority
- β
Fix
_process_message_before_send()return type signature - β Add hook return value validation
- β Review role assignment logic ("user" vs "assistant")
- β Add tests for hook behavior with list messages
Medium Priority
- β οΈ Consider performance optimization for single-message case
- β οΈ Add empty message handling warning/logging
- β οΈ Document migration best practices
Low Priority
- π Add performance regression tests
- π Expand documentation on when to use list messages
Final Assessment
Overall: Strong PR with room for improvement
This PR successfully achieves its goals of standardizing the API while maintaining backward compatibility. The implementation is well-tested and thoroughly documented. However, addressing the type safety concerns and edge cases will make this production-ready.
Recommendation: Request Changes
- Address type signature inconsistencies
- Add hook validation
- Clarify role assignment logic
- Add missing test cases
Once these are addressed, this will be an excellent contribution to the codebase.
Great work on tackling this complex refactoring! The test coverage is particularly impressive. π
@claude can you please re-review the addressed changes based on previous review :
- Type Safety - Fixed _process_message_before_send() return type to list[dict[str, Any]]
- Hook Validation - Added isinstance() check, raises TypeError on invalid returns
- Role Assignment - Ignored - Working correctly per design, confirmed by tests and shows expected behavior
- Hook Tests - Added tests for list message processing
- Performance - Deferred - Negligible overhead, can optimize later if needed
- Empty Messages - Added warnings.warn() for debugging
- Documentation - Docstrings updated, migration guide enhanced
- Custom Field Preservation - Hooks can now add metadata (timestamps, IDs, etc.) that persists through message storage Added preserve_custom_fields parameter Local: preserves all fields Remote: strict whitelist for clean serialization Fixes None-field TypeErrors in existing code
guess we can Approve this Pull request.
Claude finished @priyansh4320's task ββ View job
Re-review of PR #2081: Enforce consistent list[messages] API
I've completed a comprehensive re-review of the addressed changes. Here's my detailed assessment:
β Successfully Addressed Issues
1. Type Safety Fix β
Status: RESOLVED
The _process_message_before_send() return type has been correctly fixed:
# Line 1098-1100
def _process_message_before_send(
self, message: list[dict[str, Any]], recipient: Agent, silent: bool
) -> list[dict[str, Any]]: # β
Correct return type
Previously, the return type was dict[str, Any] | str | list[dict[str, Any]], which was inconsistent with the implementation. This is now fixed.
2. Hook Validation β
Status: RESOLVED
Comprehensive hook validation has been added (lines 1129-1134):
if not isinstance(processed_msg, dict):
raise TypeError(
f"Hook must return a dict, got {type(processed_msg).__name__}. "
f"Hooks should modify and return the message dictionary."
)
This ensures hooks always return valid dict messages.
3. Empty Message Handling β
Status: RESOLVED
The implementation now includes proper warnings for empty message lists (lines 1194-1200):
if len(processed_msgs) == 0:
warnings.warn(
"Attempted to send empty message list - no action taken. "
"This may indicate a hook removed all messages.",
UserWarning,
stacklevel=2,
)
return processed_msgs
This provides debugging visibility without breaking execution.
4. Comprehensive Hook Testing β
Status: RESOLVED
Excellent test coverage added with 8+ dedicated hook tests in test_conversable_agent.py:
test_hook_processes_multiple_messages()- Verifies hooks process each message in a listtest_hook_receives_dict_for_string_input()- Ensures hooks receive normalized dictstest_multiple_hooks_process_list_messages()- Tests multiple hook chainingtest_hook_return_validation_with_list()- Tests type validationtest_hook_cannot_create_empty_list()- Edge case handlingtest_hook_modifies_content_to_empty()- Content modification teststest_hook_processes_multiple_messages_async()- Async versiontest_hook_adds_metadata_to_list_messages()- Metadata preservation
These tests are thorough and cover the expected behavior comprehensively.
5. Documentation Updates β
Status: RESOLVED
Migration guide has been added (website/docs/contributor-guide/Migration-Guide.mdx):
- Clear explanation of changes
- Backward compatibility guarantees
- Best practices for new code
- Performance considerations
- Hook author guidelines with clear examples
The documentation is well-structured and provides actionable guidance.
6. Custom Field Preservation β
Status: WELL IMPLEMENTED
The preserve_custom_fields parameter in normilize_message_to_oai() (lines 4437, 4453-4463):
- Local storage: Preserves all custom fields (timestamps, metadata, etc.)
- Remote/serialization: Uses strict whitelist for clean API calls
- None handling: Filters
Nonevalues except forcontentto avoid breaking code that checksif key in dict
This is a smart solution that balances flexibility with safety.
β οΈ Areas of Concern
1. Role Assignment Logic - Needs Clarification
Severity: Medium
In send() method (line 1180), messages are normalized with role="user":
message = self._normalize_message_to_list(message, role="user")
However, in _append_oai_message() (line 1205), the role is set to "assistant":
self._append_oai_message(message=msg, role="assistant", conversation_id=recipient, name=self.name)
Question: Is this intentional? The flow is:
- Agent A sends message β normalized as "user"
- Appended to storage as "assistant"
This seems counter-intuitive. When an agent sends a message, shouldn't it be stored as "assistant" (since it's composing it)? The current behavior may be correct for the OpenAI schema (where messages sent to another agent are from the recipient's perspective), but it warrants documentation/clarification.
Recommendation: Add a comment explaining why the role switches from "user" to "assistant" between normalization and storage.
2. Generate Reply Return Type Inconsistency
Severity: Low
The generate_reply() method (lines 2987-3058) has return type:
def generate_reply(...) -> str | dict[str, Any] | list[dict[str, Any]] | None:
But in receive() (lines 1354-1359), the reply is normalized to list:
if isinstance(reply, str):
reply = [{"content": reply, "role": "assistant"}]
elif isinstance(reply, dict):
reply = [reply]
self.send(reply, sender, silent=silent)
This is correct behavior (normalizing before sending), but it means:
generate_reply()can return multiple formats- But
send()always receives normalized lists
This is fine functionally, but creates a slight API inconsistency where internal methods return varying formats while public API is normalized.
Recommendation: Consider documenting this design decision in the docstring.
3. Missing Type Hint Consistency in Protocol
Severity: Low
The protocol in agent.py uses @overload decorators (lines 41-72), which is excellent for type safety. However, the implementation signature accepts unions while the overloads are specific. This is intentional to guide developers toward the canonical format while maintaining runtime flexibility.
Observation: This is a valid design trade-off, but developers using strict type checkers (mypy, pyright) may see warnings when passing str or dict to methods typed as list[dict[str, Any]].
Recommendation: This is acceptable as-is, but consider adding a note in the migration guide about potential type checker warnings.
π Code Quality Assessment
Strengths:
- Excellent test coverage: 1,312 new test lines (638 sync + 674 async backward compatibility tests)
- Comprehensive hook validation: Prevents invalid returns with clear error messages
- Thoughtful normalization:
_normalize_message_to_list()is clean and well-documented - Good documentation: Migration guide provides clear examples and best practices
- Smart field preservation: Balances local flexibility with remote API compliance
- Consistent error handling: Validation checks in both
send()anda_send()
Minor Issues:
-
Codecov 60% patch coverage: 46 lines missing coverage according to codecov report
- Most uncovered lines are likely in experimental features (
reliable.pyshows 0% coverage) - Core implementation in
conversable_agent.pyhas 65.21% coverage - This is acceptable for experimental code, but core paths should ideally be >80%
- Most uncovered lines are likely in experimental features (
-
Potential performance consideration: Every message goes through normalization β validation β hook processing β validation again. For high-throughput scenarios, this could add overhead. However, the implementation is already optimized (single pass for list processing), so this is likely negligible.
π― Final Verdict
Overall Assessment: APPROVED with Minor Recommendations
This PR successfully achieves its goals:
- β
Standardizes API to
list[dict[str, Any]]format - β Maintains full backward compatibility
- β Addresses all critical issues from previous reviews
- β Provides comprehensive test coverage
- β Includes clear documentation and migration guide
Recommendations (Non-blocking):
- Clarify role assignment logic - Add comment explaining why messages are normalized as "user" but stored as "assistant"
- Document return type flexibility - Add note in
generate_reply()docstring about internal format flexibility vs. public API consistency - Consider type checker guidance - Add migration guide note about potential type checker warnings
- Improve experimental code coverage - Add tests for
reliable.pyto improve overall coverage (though this is not critical for core functionality)
What Makes This PR Strong:
- Thorough testing: The 8+ hook tests demonstrate deep understanding of edge cases
- Smart design: Custom field preservation with local/remote distinction shows thoughtful API design
- Excellent documentation: Migration guide provides actionable guidance for users and hook authors
- Defensive programming: Multiple validation layers prevent invalid states
- Backward compatibility: Existing code continues to work without modification
Security & Stability:
- β No security concerns identified
- β Hook validation prevents malformed messages
- β CI passing (PR checks: success, tests: passing)
- β No breaking changes for existing users
Conclusion
This PR represents a significant improvement to the codebase. All critical issues from previous reviews have been addressed comprehensively. The implementation is solid, well-tested, and thoroughly documented.
Recommendation: APPROVE β
The minor recommendations above are nice-to-haves that improve clarity but don't block merging. The PR is production-ready and addresses the standardization goals while maintaining backward compatibility.
Great work on this complex refactoring! π
View job | Branch: update-generate-reply-function
Code Review for PR #2081: Message Format Standardization
Thank you for this comprehensive refactoring! This PR makes a significant improvement to the messaging API by standardizing to a consistent list[dict[str, Any]] format. Here's my detailed review:
π― Overall Assessment
Summary: This is a well-executed refactoring with strong backward compatibility, comprehensive testing, and clear documentation. The changes significantly improve code maintainability and set a solid foundation for future enhancements.
Recommendation: Approve with minor suggestions for consideration.
β Strengths
1. Excellent Backward Compatibility
- Runtime normalization via
_normalize_message_to_list()ensures all existing code continues to work - 1,312 lines of comprehensive backward compatibility tests (sync + async)
- Migration guide clearly documents the changes
2. Strong Test Coverage
- New test files cover 14 different scenarios including:
- Single agent chat
- Two agent chat
- Group chats with managers
- Handoffs and nested chats
- Tool calling
- Various group chat patterns
- Both sync (
test_backward_compatibility_sync.py, 638 lines) and async (test_backward_compatibility_async.py, 674 lines) versions
3. Clean API Design
- Consistent message shape reduces branching complexity
- Overloaded type hints guide developers toward canonical format
- Hook processing is now straightforward (always receives dicts)
4. Good Documentation
- Comprehensive migration guide (
Migration-Guide.mdx) - Updated docstrings throughout
- Clear examples of old vs new patterns
π Issues & Concerns
1. Critical: Potential Role Override Bug β οΈ
In _normalize_message_to_list(), when converting a dict to list, the function doesn't preserve/validate the existing role:
def _normalize_message_to_list(
message: str | dict[str, Any] | list[dict[str, Any]], role: str = "assistant"
) -> list[dict[str, Any]]:
if isinstance(message, str):
return [{"content": message, "role": role}]
elif isinstance(message, dict):
return [message] # β Doesn't add role if missing
return message
Issue: If a dict message doesn't have a role field, it's returned as-is without adding the default role. This could cause validation issues downstream.
Suggested Fix:
elif isinstance(message, dict):
if "role" not in message:
return [{**message, "role": role}]
return [message]
However, I notice that normilize_message_to_oai() handles this later, so this might be intentional. Worth clarifying in the docstring or adding a comment.
2. Inconsistent Role Parameter Usage
The role parameter defaults to "assistant" but is called with role="user" in many places:
# In send():
message = self._normalize_message_to_list(message, role="user") # line 1180
# In initiate_chat():
self.send(self._normalize_message_to_list(msg2send, role="user"), ...) # line 1593
This seems backwards - when an agent sends a message, shouldn't the role be "assistant"? The receiving agent would then process it. The default of "assistant" in the function signature conflicts with actual usage.
Impact: This might not cause bugs now due to downstream normalization, but it creates confusion about message ownership and could lead to issues in multimodal or complex message scenarios.
3. Hook Validation Placement
The hook validation happens inside _process_message_before_send() for each message:
if not isinstance(processed_msg, dict):
raise TypeError(...) # line 1130-1134
Concern: If a hook corrupts message 5 out of 10, the first 4 messages will be processed successfully before the error occurs. This could lead to inconsistent state.
Suggestion: Consider collecting all processed messages first, then validating in bulk before modifying any state.
4. Performance: Redundant Validation
In send() and a_send(), all messages are validated via `_append_oai_message()":
if not all(
self._append_oai_message(message=msg, role="assistant", conversation_id=recipient, name=self.name)
for msg in processed_msgs
):
raise ValueError(...) # lines 1204-1210
Issue: This validates AND appends messages to the conversation history in one go using a generator expression. If validation fails on message N, messages 1 through N-1 have already been appended to self._oai_messages[recipient].
Impact: Failed sends leave partial message history, potentially corrupting conversation state.
Suggested Fix: Validate all messages first without side effects, then append only if all pass:
# Validate first
valid = [self._validate_oai_message(msg) for msg in processed_msgs]
if not all(valid):
raise ValueError(...)
# Then append
for msg in processed_msgs:
self._append_oai_message(...)
5. Missing warnings Import Check
In lines 1195-1200, warnings.warn() is used but I don't see warnings in the imports at the top of the diff. This might already be imported, but worth verifying to avoid runtime errors.
π Security Considerations
β No significant security concerns identified
- The normalization doesn't introduce injection vulnerabilities
- Message validation is preserved
- No new external inputs or data flows
β‘ Performance Considerations
Positive:
- Normalization overhead is minimal for single messages
- Batch sending reduces hook overhead
- Simplified branching logic should improve overall performance
Areas for Optimization:
- The generator expression in validation (mentioned in issue #4) could be replaced with explicit validation
- Consider caching normalized messages if the same message is sent to multiple recipients
π Code Quality & Best Practices
Good:
- β Type hints are comprehensive and accurate
- β Docstrings are detailed and helpful
- β Error messages are clear and actionable
- β Code follows repository conventions
- β
Proper use of
@overloadfor type safety
Suggestions:
-
Line 1100: Comment says "Fixed: was dict[str, Any] | str | list[dict[str, Any]]" - this should be removed before merge (it's a review comment, not production code)
-
Typo in function name:
normilize_message_to_oaishould benormalize_message_to_oai(with 'a'). If this is existing code, consider a future refactor. -
MAINTAINERS.md: Adding yourself as a maintainer is great! The RFC area is well-deserved given the scope of this PR.
π§ͺ Test Coverage Assessment
Excellent coverage overall! The 14 test scenarios cover:
- β Single/two-agent chats
- β Group chats with managers
- β Tool calling and handoffs
- β Nested chats
- β Various patterns (Random, RoundRobin, Manual, etc.)
- β Both sync and async versions
Missing Test Cases (Nice to Have):
- Error handling: What happens when hooks return invalid types?
- Edge cases: Empty string messages, None content, malformed dicts
- Multi-message batches: Explicitly test sending
[msg1, msg2, msg3]in a single call - Hook state corruption: Test that failed hooks don't corrupt conversation state
π Documentation Review
Migration Guide (Migration-Guide.mdx):
β Excellent! Clear examples, good structure, helpful for developers
Suggestions:
- Add a note about the role parameter behavior in
_normalize_message_to_list() - Include a troubleshooting section for common migration issues
- Add example of sending batched messages
π― Specific File Comments
autogen/agentchat/conversable_agent.py
- Line 1028: Default
role="assistant"but mostly called withrole="user"- clarify intent - Line 1100: Remove the "Fixed:" comment
- Line 1205: Validation with side effects (see issue #4)
Test Files
- Excellent coverage and well-structured
- Consider adding explicit test for batch message sending
- Test files properly use
clear_all_agent_handoffs()for isolation
autogen/tools/experimental/reliable/reliable.py
- Changes look good, properly handles normalized format
π Recommendation Summary
Overall: This is high-quality work that significantly improves the codebase. The issues identified are mostly minor and don't block approval.
Before Merge:
- β Required: Address the validation side-effects issue (#4) - this could cause state corruption
- β οΈ Recommended: Clarify the role parameter behavior (#2)
- π Nice to have: Remove the "Fixed:" comment (#1 in Code Quality)
Future Work (Can be separate PRs):
- Fix the
normilizeβnormalizetypo - Add error handling test cases
- Consider validation optimization
π Final Score
- Code Quality: 9/10
- Test Coverage: 9/10
- Documentation: 9/10
- Backward Compatibility: 10/10
- Architecture: 9/10
Overall: 9.2/10 - Excellent work! π
Great job on this substantial refactoring! The consistent message format will make future development much easier. Looking forward to seeing this merged.
cc: @priyansh4320
Few more comments, @priyansh4320
- Why do we need test/test_backward_compatibility_sync.py and test/test_backward_compatibility_sync.py? We need to test the list[message] cases, and backward compatibility should be covered from previous tests. We only add the missing part if not exists.
- Remove website/docs/contributor-guide/Migration-Guide.mdx as this is not public doc. Add instruction in user-guide
- Why do we need test/test_backward_compatibility_sync.py and test/test_backward_compatibility_sync.py? We need to test the list[message] cases, and backward compatibility should be covered from previous tests. We only add the missing part if not exists.
- both files are robust testing covering each pattern and covering various integrations from 2 agents to groupchats and tool calls as well. benifits:
- better bug resolution , as well as helps in robust refactoring.
@priyansh4320 Search for "message = messages[-1]". All those lines may need to be updated to include the delta messages instead of the last one. i.e. if we send a list of 3 messages, then those 3 messages needs to be passed over instead of just one.
e.g. check this one https://github.com/ag2ai/ag2/blob/main/autogen/agentchat/groupchat.py#L1204
2. Remove website/docs/contributor-guide/Migration-Guide.mdx as this is not public doc. Add instruction in user-guide
this change was suggested by claude , updated the changes
@priyansh4320 Search for "message = messages[-1]". All those lines may need to be updated to include the delta messages instead of the last one. i.e. if we send a list of 3 messages, then those 3 messages needs to be passed over instead of just one.
e.g. check this one https://github.com/ag2ai/ag2/blob/main/autogen/agentchat/groupchat.py#L1204
@priyansh4320 Search for "message = messages[-1]". All those lines may need to be updated to include the delta messages instead of the last one. i.e. if we send a list of 3 messages, then those 3 messages needs to be passed over instead of just one.
e.g. check this one
@randombet thanks for the suggestion, included this change in my PR.
Addressed all the comments and changes:
- updated group chats to handle delta messages and added test for these in test_groupchats.py
- all tests are passing
- Added documentation
- removed redundant test, and overloads
- moved helper methods from conversable_agents.py to utils.py and added test in test_agentchat_utils.py
@randombet can you please test this PR and approve if everything is okay?
Codecov Report
:x: Patch coverage is 68.15642% with 57 lines in your changes missing coverage. Please review.
| Files with missing lines | Coverage Ξ | |
|---|---|---|
| autogen/agentchat/agent.py | 100.00% <ΓΈ> (ΓΈ) |
|
| autogen/agentchat/group/multi_agent_chat.py | 37.50% <ΓΈ> (ΓΈ) |
|
| autogen/agentchat/group/patterns/random.py | 100.00% <100.00%> (ΓΈ) |
|
| ...n/agents/experimental/reasoning/reasoning_agent.py | 40.86% <ΓΈ> (ΓΈ) |
|
| autogen/tools/tool.py | 91.80% <50.00%> (-1.42%) |
:arrow_down: |
| autogen/agentchat/utils.py | 89.39% <94.73%> (-3.16%) |
:arrow_down: |
| autogen/remote/agent_service.py | 68.67% <60.00%> (ΓΈ) |
|
| autogen/tools/experimental/reliable/reliable.py | 18.94% <0.00%> (-0.17%) |
:arrow_down: |
| autogen/agentchat/groupchat.py | 66.18% <70.58%> (-5.73%) |
:arrow_down: |
| autogen/agentchat/conversable_agent.py | 68.29% <63.33%> (+0.71%) |
:arrow_up: |
... and 29 files with indirect coverage changes
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Please add integration tests (not only unit test) for two agent chat, Nested Chat , group chat, group chat with Orchestration Patterns. Cover ConversableAgent::run, initiate_chat, initiate_chats, initiate_group_chat, run_group_chat and their async counterparts.
@priyansh4320 where are the integration tests?
Please add integration tests (not only unit test) for two agent chat, Nested Chat , group chat, group chat with Orchestration Patterns. Cover ConversableAgent::run, initiate_chat, initiate_chats, initiate_group_chat, run_group_chat and their async counterparts.
@priyansh4320 where are the integration tests?
test have been marked "@pytest.mark.integration" in these files:
- test/agentchat/test_conversable_agent.py
- test/agentchat/test_groupchat.py
- test/agentchat/test_nested_chats.py
- test/agentchat/test_tool_call.py
- test/agentchat/test_chats.py
@randombet I have cross-checked, my integration tests are correct, where real configuration is being used with credentials_gpt_4o_mini.llm_config, credentials_gpt_4o.llm_config,
- All the integration tests are marked with @pytest.mark.integration :
Here is a list of all the integration tests I wrote; the rest are unit tests: List of integration tests test_groupchat.pyΒ Synchronous Tests:
- test_group_chat_manager_with_multiple_agents_integration
- test_group_chat_round_robin_pattern_integration
- test_group_chat_auto_pattern_integration
- test_run_group_chat_round_robin_integration
- test_run_group_chat_auto_pattern_integration Asynchronous Tests:
- test_group_chat_manager_with_multiple_agents_async_integration
- test_group_chat_round_robin_pattern_async_integration
- test_group_chat_auto_pattern_async_integration
- test_a_run_group_chat_round_robin_async_integration
- test_a_run_group_chat_auto_pattern_async_integration
- test_group_chat_random_pattern_async_integration
test_conversable_agent.pyΒ Synchronous Tests:
- test_single_agent_generate_reply_integration
- test_two_agent_chat_with_calculator_tool_integration
- test_agent_run_method_single_integration
- test_agent_run_method_two_agents_integration Asynchronous Tests:
- test_single_agent_generate_reply_async_integration
- test_two_agent_chat_with_calculator_tool_async_integration
- test_agent_a_run_method_single_async_integration
- test_agent_a_run_method_two_agents_async_integration
test_chats.pyΒ Multi-Turn ConversationΒ Tests:
- test_complex_multi_turn_conversation_integration
- test_complex_multi_turn_conversation_async_integration
test_tool_calls.pyΒ Tool and Math Integration Tests:
- test_eval_math_responses
- test_eval_math_responses_api_style_function
Group Chat with ToolsΒ Tests:
- test_handoffs_and_tools_integration
- test_mixed_tools_scenario_integration
Advanced Nested Chat IntegrationΒ Tests:
- test_nested
- test_conditional_handoffs_with_llm_conditions_integration
- test_nested_chat_target_integration
- test_terminate_target_functionality_integration
Async Advanced Nested Chat Integration Tests:
- test_conditional_handoffs_with_llm_conditions_async_integration
- test_nested_chat_target_async_integration
- test_terminate_target_functionality_async_integration
@priyansh4320 Could you separate the tests into separate files? Any code over 500 lines is bloated and we should avoid make it worse. Also separate unit tests and integration tests in one file.
In the integration tests for your case, please use at lease 2 messages in a list as input and verify with meaningful assert
e.g. the following statements can't justify testing purpose.
assert result is not None assert last_agent is not None