ag2 icon indicating copy to clipboard operation
ag2 copied to clipboard

refactor: Enforce consistent list[messages] API across agent messaging

Open priyansh4320 opened this issue 3 months ago β€’ 23 comments

Why are these changes needed?

Supports:

  1. conversation API https://platform.openai.com/docs/api-reference/conversations/create-items is where OAI support list input.

  2. This is also used for response API https://platform.openai.com/docs/api-reference/responses/create#responses_create-conversation

And response API is going to replace chat completion API and all new features will be released through it.

  1. also adds support for threads API https://platform.openai.com/docs/api-reference/threads/createThread.

  2. This PR standardizes all agent messaging to a single, predictable list[dict] shape end-to-end, which:

  • Reduces branching and simplifies code paths
  • Makes hook processing straightforward (hooks always receive normalized dicts)
  • Improves support for multimodal content and tool calls
  • Provides a cleaner foundation for future messaging enhancements Changes Made Type System Updates:
  • Updated type hints for send, a_send, receive, a_receive, initiate_chat, a_initiate_chat, generate_reply, and a_generate_reply to accept/return list[dict[str, Any]]
  • Updated Agent protocol signatures to match implementation Runtime Compatibility:
  • Added centralized _normalize_message_to_list() for backward compatibility
  • Runtime silently accepts legacy str | dict formats and normalizes them All existing code continues to work without modification Bug Fixes:
  • Fixed infinite loop issue in nested chats Added test isolation helpers to prevent cross-test contamination from handoff state Testing:
  • Added comprehensive backward compatibility test suites (626 lines sync, 655 lines async)
  • Verified compatibility with groupchats, handoffs, tool calling, patterns, groupchat managers, handoff conditions, and nested chats
  • Updated experimental hook implementations (reliable.py) to handle normalized format Backward Compatibility
  • Existing code using agent.send({"content": "..."}) continues to work via runtime normalization. Type hints guide new code toward the canonical format while runtime remains flexible.

Related issue number

closes #2123 closes #2124

Checks

  • [x] I've included any doc changes needed for https://docs.ag2.ai/. See https://docs.ag2.ai/latest/docs/contributor-guide/documentation/ to build and test documentation locally.
  • [x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
  • [x] I've made sure all auto checks have passed.

priyansh4320 avatar Sep 04 '25 23:09 priyansh4320

πŸ“ Documentation Analysis

All docs are up to date! πŸŽ‰


βœ… Latest commit analyzed: a53a5364cf6b1cf4d37d6d726ef6cd36aa149420 | Powered by Joggr

joggrbot[bot] avatar Sep 04 '25 23:09 joggrbot[bot]

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 05 '25 00:09 CLAassistant

Great work! However, we should standardize the data and use messages: list[dict[str, Any]] instead of messages: list[dict[str, Any] | dict[str, Any]]. So we need to refactor the current logic to return a [reply] instead of reply to unify all usage and work with lists consistently.

thanks , will fix this

priyansh4320 avatar Sep 07 '25 09:09 priyansh4320

PR Review: Enforce consistent list[messages] API across agent messaging

Summary

This PR aims to standardize agent messaging to use list[dict[str, Any]] throughout the codebase. While the intent is valuable for consistency, there are several critical issues that need to be addressed before merging.

πŸ”΄ Critical Issues

1. Type Signature Inconsistency - Breaking Change

Location: autogen/agentchat/agent.py:43, 59, 75, 91

The send() and a_send() method signatures changed from dict[str, Any] | str to list[dict[str, Any]] | str, but the receive() and a_receive() methods now only accept list[dict[str, Any]] (removed str and dict support).

# agent.py:43 - send accepts str
def send(self, message: list[dict[str, Any]] | str, ...

# agent.py:75 - receive does NOT accept str or dict anymore
def receive(self, message: list[dict[str, Any]], ...

Problem: This creates an asymmetry where:

  • send("hello") is valid (string accepted)
  • But internally it must call receive([...]) with a list

Impact: This is a breaking API change. Existing code calling agent.receive(message_dict, sender) or agent.receive("text", sender) will break.

2. Backward Compatibility Broken

Location: autogen/agentchat/conversable_agent.py:1288, 1328

The type hints show receive() only accepts list[dict[str, Any]] but the implementation tries to handle single messages:

def receive(self, message: list[dict[str, Any]], sender, ...):
    # Line 1317: checks if isinstance(message, list)
    # But type hint says it MUST be a list already

Problems:

  • Type checkers will reject agent.receive({"content": "hi", "role": "user"}, sender)
  • Documentation says one thing, implementation does another
  • The conversion logic at line 1317-1318 will never trigger because type system expects a list

3. Inconsistent a_receive Signature

Location: autogen/agentchat/conversable_agent.py:1328

async def a_receive(
    self,
    message: dict[str, Any] | list[dict[str, Any]] | str,  # Accepts all types
    sender: Agent,
    ...
)

But the sync version receive() at line 1288 only accepts list[dict[str, Any]]. This inconsistency will confuse users.

4. _process_message_before_send Logic Flaw

Location: autogen/agentchat/conversable_agent.py:1109-1152

The method signature says input is list[dict[str, Any]] but then has an else branch (lines 1129-1152) handling non-list inputs. This code is unreachable given the type signature.

def _process_message_before_send(
    self, message: list[dict[str, Any]], recipient, silent
) -> dict[str, Any] | str | list[dict[str, Any]]:
    # ...
    if isinstance(message, list):  # Always true per type hint
        # ...
    else:  # UNREACHABLE CODE
        # Lines 1130-1152: converts str/dict to list

Root Cause: The type hint is wrong. Should be list[dict[str, Any]] | str | dict[str, Any] to match actual logic.

5. send() Method Type Mismatch

Location: autogen/agentchat/conversable_agent.py:1154-1206

def send(self, message: str | list[dict[str, Any]], ...):
    # Line 1192: converts string to list
    if isinstance(message, str):
        message = [message]  # Now it's list[str], not list[dict]!
    
    # Line 1195: calls _process_message_before_send
    processed_msgs = self._process_message_before_send(message, ...)
    # But _process_message_before_send expects list[dict[str, Any]]
    # Not list[str]!

Problem: When message="hello", line 1192 converts it to ["hello"] (list of strings), but _process_message_before_send expects list[dict]. Type error!

6. Missing Empty List Check in Tool Reply Methods

Location: autogen/agentchat/conversable_agent.py:2457, 2487, 2517, 2613

The PR adds empty message checks:

if not messages:
    return False, None

Good: Prevents crashes on empty lists. Concern: Silently returning False, None might hide bugs. Consider logging a warning or being explicit about this edge case in documentation.

🟑 Medium Priority Issues

7. Performance: List Conversion Overhead

Location: Multiple places converting single messages to lists

Every send("text") or send(dict) now gets wrapped in a list, processed, then unwrapped. For high-frequency messaging scenarios, this could add overhead.

Suggestion: Consider lazy conversion or maintain both paths internally.

8. Hook Processing Inconsistency

Location: autogen/agentchat/conversable_agent.py:1115-1128

When processing a list of messages, each message is processed through ALL hooks:

for msg in message:
    processed_msg = msg
    for hook in hook_list:
        processed_msg = hook(...)

But hooks receive individual messages, not the list. This could break hooks that expect context about message ordering or batching.

9. Tool Call ID Optional Handling

Location: autogen/agentchat/conversable_agent.py:2570

async def _a_execute_tool_call(self, tool_call):
    tool_call_id = tool_call.get("id", None)
    # ...
    if tool_call_id is not None:
        return {"tool_call_id": tool_call_id, ...}
    else:
        return {"role": "tool", ...}  # No tool_call_id

Good: Handles Mistral API compatibility. Concern: The sync version _execute_tool_call (if it exists) may not have this logic. Ensure parity.

10. Documentation Incomplete

Location: autogen/agentchat/agent.py:50, 65

Doc strings say:

  • Line 50: "If a list of messages, it should be..."
  • Line 65: "If a list[messages] or str..."

But actual parameter is list[dict[str, Any]] | str. The docs don't explain what happens when you pass a string. Does it get auto-converted to [{"content": str, "role": "user"}]?

🟒 Positive Aspects

  1. Comprehensive Test Coverage: The addition of test_backward_compatibility_sync.py and test_backward_compatibility_async.py with 1255 new test lines is excellent.

  2. RandomPattern Bug Fix: Moving handoff generation before super().prepare_group_chat() (lines 71-75 in random.py) fixes a validation issue. Smart catch!

  3. Tool Registry Update: Line 99-100 in autogen/tools/tool.py ensures tools are added to agent._tools list. Good for consistency.

  4. Null Content Handling: Line 2583 in conversable_agent.py handles None content gracefully.

  5. Generate Reply Hook Enhancement: Lines 2954-2961 properly wrap single messages in lists before passing to hooks, maintaining backward compatibility at the hook level.

πŸ” Security Considerations

Low Risk: The changes primarily affect internal data flow, not external inputs. However:

  • Ensure hook functions can't be exploited by passing malformed message lists
  • The tool input/output processing hooks (lines 2574-2578) need security review to ensure they can't be bypassed

🎯 Recommendations

Must Fix Before Merge:

  1. Decide on the API contract: Either:

    • Option A: send() and receive() both accept str | dict | list[dict] (full backward compatibility)
    • Option B: Both accept only list[dict] (breaking change, requires major version bump)
    • Option C: send() accepts any format, receive() only accepts lists (current state, but needs clear docs)
  2. Fix type hints to match actual implementation throughout

  3. Synchronize sync and async methods: receive() and a_receive() should have identical signatures

  4. Fix the send() string conversion at line 1192 to create proper dict format: [{"content": message, "role": "user"}]

  5. Update all docstrings to clearly explain the conversion behavior

Nice to Have:

  1. Add migration guide for users updating from previous versions
  2. Consider deprecation warnings if maintaining backward compatibility
  3. Add integration tests covering real-world scenarios (group chats with mixed message types)
  4. Performance benchmark for high-frequency messaging scenarios

Verdict

Changes Requested - The core idea is sound, but implementation has critical type consistency issues that need resolution before merge. The test coverage is impressive, but the type system mismatches could cause runtime errors that tests might miss.

Great work on tackling this complex refactoring! With the fixes above, this will be a solid improvement to the codebase.

claude[bot] avatar Oct 16 '25 19:10 claude[bot]

@Lancetnik drafting this, made a mistake while rebasing and want to test my PR once more , based on claude review.

priyansh4320 avatar Oct 16 '25 21:10 priyansh4320

Code Review: PR #2081 - Enforce consistent list[messages] API

Overview

This PR makes a significant architectural change to standardize agent messaging to list[dict[str, Any]] across the entire codebase. This is a substantial refactor (+2033/-254 lines across 23 files) that aims to improve consistency and reduce branching logic.

Strengths

  1. Clear design intent with _normalize_message_to_list() providing backward compatibility
  2. Comprehensive test coverage with sync and async backward compatibility tests
  3. Good documentation updates with clear docstrings

Critical Issues

1. Breaking API Contract (High Priority)

Location: autogen/agentchat/agent.py:40-72

The protocol signatures declare message: list[dict[str, Any]], but the implementation accepts str | dict | list. Type checkers will flag this mismatch.

Recommendation: Update protocol signature to match actual behavior or use overloads.

2. Skipped Tests (High Priority)

Location: test/test_backward_compatibility_sync.py:14, test/test_backward_compatibility_async.py:16

The tests are ALWAYS skipped, even with API key present. They need conditional skip logic.

3. Missing Defensive Checks (Medium Priority)

Location: autogen/agentchat/conversable_agent.py:1207-1220

No validation that processed_msgs is still a list after hook processing. Add checks for None, empty list, and type validation.

4. Missing Methods (High Priority)

Location: autogen/agentchat/conversable_agent.py:2604, 2611

Code calls self._process_tool_input() and self._process_tool_output() but these methods aren't in the diff. Verify they exist.

Recommendations

Must Fix Before Merge:

  1. Fix skipped backward compatibility tests
  2. Resolve protocol/implementation type mismatch
  3. Add defensive checks for hook return values
  4. Verify missing methods exist

Should Fix: 5. Add explicit None/empty list handling 6. Clarify role assignment logic with comments 7. Add tests for infinite loop prevention 8. Fix tool_call_id handling for API compatibility

Conclusion

Well-intentioned refactor with good practices, but needs another iteration to address critical issues. The approach is sound, execution needs refinement.

Recommendation: Request Changes - Address critical issues before merging.

Great work on tackling this significant refactor!

claude[bot] avatar Oct 21 '25 14:10 claude[bot]

@claude can you please re-review the requested changes in this review.

Critical Issues

  1. Breaking API Contract Location: autogen/agentchat/agent.py:40-72 The protocol signatures declare message: list[dict[str, Any]] (dismissed as not needed)

  2. Skipped Tests (High Priority) Location: test/test_backward_compatibility_sync.py:14, test/test_backward_compatibility_async.py:16 The tests are ALWAYS skipped, even with API key present. They need conditional skip logic. (handled with conditional skip)

  3. Missing Defensive Checks (Medium Priority) Location: autogen/agentchat/conversable_agent.py:1207-1220 No validation that processed_msgs is still a list after hook processing. Add checks for None, empty list, and type validation. (handled with added checks)

  4. Missing Methods (High Priority) Location: autogen/agentchat/conversable_agent.py:2604, 2611 Code calls self._process_tool_input() and self._process_tool_output() but these methods aren't in the diff. Verify they exist. (dismissed as already present in conversable_agent.py _process_tool_input() at line 4009 Called at lines 2579 and 2627 Processes tool inputs through safeguard hooks _process_tool_output() at line 4026 Called at lines 2592 and 2634)

priyansh4320 avatar Oct 21 '25 19:10 priyansh4320

Claude finished @priyansh4320's task β€”β€” View job


Re-review of Critical Issues

I've completed a thorough re-review of the previously identified critical issues. Here's my assessment:


βœ… Issue #2: Skipped Tests - PROPERLY FIXED

Status: Resolved correctly

The conditional skip logic is now properly implemented:

  • Both test_backward_compatibility_sync.py and test_backward_compatibility_async.py use @pytest.mark.skipif(OPENAI_API_KEY == "None", reason="OpenAI API key not found")
  • Found 14 test functions in each file with this decorator
  • Tests will run when API key is present, skip when absent
  • This follows the correct pytest pattern for conditional skipping

Code reference: test/test_backward_compatibility_sync.py:138


βœ… Issue #3: Missing Defensive Checks - PROPERLY FIXED

Status: Resolved correctly

Comprehensive defensive checks have been added to both send() and a_send() methods:

In send() method (lines 1213-1223):

# Validate hooks didn't break the message
if processed_msgs is None:
    raise ValueError(
        "Message processing hook returned None. Messages must be a non-empty list of dictionaries."
    )

if not isinstance(processed_msgs, list):
    raise TypeError(f"Message processing hook must return a list, got {type(processed_msgs).__name__}")

if len(processed_msgs) == 0:
    return  # Allow empty list as no-op

In a_send() method (lines 1281-1289): Same defensive checks are mirrored in the async version.

Code reference: autogen/agentchat/conversable_agent.py:1213-1223, 1281-1289


⚠️ Issue #1: Breaking API Contract - NEEDS DISCUSSION

Status: Dismissed by author, but warrants discussion

The protocol in agent.py declares:

  • send(message: list[dict[str, Any]])
  • receive(message: list[dict[str, Any]])

However, the implementation in conversable_agent.py accepts runtime normalization via _normalize_message_to_list() which handles str | dict | list[dict].

My Assessment:

This is intentional design, not a bug. Here's why:

  1. Type hints guide new code - They encourage the canonical list[dict] format
  2. Runtime remains flexible - _normalize_message_to_list() provides backward compatibility
  3. Documentation explains it - Docstrings clearly state: "If a str or dict, will be automatically normalized to list format"

The approach is valid but creates a type system vs runtime mismatch:

  • βœ… Pro: Nudges developers toward the new API without breaking existing code
  • ❌ Con: Type checkers (mypy, pyright) will flag valid runtime code as errors
  • ❌ Con: May confuse developers who rely on type hints

Recommendation: Consider one of these approaches:

  1. Add overloads (preferred for type safety):

    @overload
    def send(self, message: str, recipient: Agent, ...) -> None: ...
    @overload
    def send(self, message: dict[str, Any], recipient: Agent, ...) -> None: ...
    @overload
    def send(self, message: list[dict[str, Any]], recipient: Agent, ...) -> None: ...
    
  2. Accept the trade-off - Document that type hints show the preferred API, runtime accepts legacy formats

  3. Union type - Use message: list[dict[str, Any]] | dict[str, Any] | str (less "opinionated")

The current approach is functionally correct but may cause type checker friction for users.


βœ… Issue #4: Missing Methods - VERIFIED

Status: Methods exist, correctly dismissed

Confirmed that both methods exist and are properly called:

  • _process_tool_input() at line 4009

    • Called at lines 2579 and 2627
    • Processes tool inputs through safeguard hooks
  • _process_tool_output() at line 4026

    • Called at lines 2592 and 2634
    • Processes tool outputs through safeguard hooks

Code reference: autogen/agentchat/conversable_agent.py:4009, 4026


Summary

Fixed Issues: 2/4 critical issues properly resolved

  • βœ… Issue #2: Skipped Tests
  • βœ… Issue #3: Missing Defensive Checks

Verified/Dismissed: 2/4 issues correctly assessed

  • βœ… Issue #4: Missing Methods (correctly dismissed - methods exist)
  • ⚠️ Issue #1: Breaking API Contract (intentional design choice, but warrants discussion)

Overall Assessment

The PR addresses the critical implementation issues effectively. The only remaining consideration is the type system vs runtime mismatch in Issue #1, which is a design trade-off rather than a bug.

My Recommendation:

  • If you want strict type safety β†’ Add @overload decorators
  • If you want to guide developers toward the new API β†’ Current approach is acceptable, but add a note in the migration guide about type checker warnings

The code is production-ready from a functionality standpoint. The type hint question is about developer experience and tooling compatibility.


Job run: https://github.com/ag2ai/ag2/actions/runs/unknown
Branch: update-generate-reply-function

claude[bot] avatar Oct 21 '25 19:10 claude[bot]

Pull Request Review - PR #2081

Overview

This PR standardizes agent messaging to use a consistent list[dict[str, Any]] format throughout the codebase. This is a significant architectural change that improves consistency and maintainability. Overall, the approach is sound, but there are several important considerations to address.


Code Quality & Best Practices

βœ… Strengths

  1. Comprehensive backward compatibility: The _normalize_message_to_list() helper ensures existing code continues to work
  2. Good use of overloads: Protocol definitions use @overload to maintain type safety while supporting multiple input formats
  3. Extensive test coverage: 1,312 lines of new backward compatibility tests (sync + async)
  4. Documentation updates: Updated docstrings to reflect the new API

⚠️ Areas for Improvement

1. Type Safety Concerns (autogen/agentchat/conversable_agent.py:1096-1119)

The _process_message_before_send() method has a problematic type signature:

def _process_message_before_send(
    self, message: list[dict[str, Any]], recipient: Agent, silent: bool
) -> dict[str, Any] | str | list[dict[str, Any]]:  # ⚠️ Return type too broad

Issues:

  • Input is list[dict] but return type includes str | dict, which is inconsistent
  • The implementation only returns list[dict[str, Any]] but the signature suggests otherwise
  • This could lead to type checker confusion

Recommendation:

def _process_message_before_send(
    self, message: list[dict[str, Any]], recipient: Agent, silent: bool
) -> list[dict[str, Any]]:  # Consistent with input type

2. Error Handling in send() and a_send() (autogen/agentchat/conversable_agent.py:1173-1200)

The validation logic has edge cases:

if len(processed_msgs) == 0:
    return  # Allow empty list as no-op

Concerns:

  • Silent failures can be hard to debug
  • Inconsistent with the error message: "Messages must be a non-empty list"
  • Should this be a warning instead of silent success?

Recommendation: Either log a warning or reconsider if empty message lists should be allowed:

if len(processed_msgs) == 0:
    warnings.warn("Attempted to send empty message list - no action taken")
    return

3. Hook Processing Assumption (autogen/agentchat/conversable_agent.py:1096-1119)

The code assumes hooks always return valid messages:

for hook in hook_list:
    processed_msg = hook(
        sender=self,
        message=processed_msg,
        recipient=recipient,
        silent=ConversableAgent._is_silent(self, silent),
    )
processed_messages.append(processed_msg)

Issue: No validation that hook() returns a dict. If a hook returns None or an invalid type, it will fail later.

Recommendation:

for hook in hook_list:
    processed_msg = hook(...)
    if not isinstance(processed_msg, dict):
        raise TypeError(
            f"Hook {hook.__name__} must return dict, got {type(processed_msg).__name__}"
        )

Potential Bugs

πŸ› Bug 1: generate_function_call_reply() Empty Message Handling

In autogen/agentchat/conversable_agent.py:2506-2507:

if not messages:
    return False, None

Issue: This check was added but the function signature hasn't changed to document that messages can be empty. This could cause confusion.

Recommendation: Update docstring or add explicit handling documentation.


πŸ› Bug 2: Role Assignment Inconsistency

In conversable_agent.py:1173, messages are normalized with role="user":

message = self._normalize_message_to_list(message, role="user")

But in conversable_agent.py:1336, the reply is normalized with role="assistant":

reply = [{"content": reply, "role": "assistant"}]

Issue: This creates an asymmetry. When an agent sends a message, shouldn't the role be "assistant" (since it's the one composing it)?

Current behavior:

  • Agent A sends message β†’ normalized as "user" β†’ sent to Agent B
  • Agent B receives β†’ appends to chat history
  • Agent B generates reply β†’ normalized as "assistant"

Recommendation: Review the role assignment logic to ensure it aligns with OpenAI's chat completion schema expectations.


Performance Considerations

⚑ List Iteration Overhead

The new implementation iterates over message lists multiple times:

  1. _normalize_message_to_list() - converts to list
  2. _process_message_before_send() - iterates for hooks
  3. Validation loop in send() - validates each message
  4. receive() - processes each message individually

Impact: For single messages (the common case), this adds unnecessary overhead.

Recommendation: Consider optimizing the common path (single message) vs. the rare path (multiple messages):

if isinstance(message, str):
    # Fast path for single string message
    processed = self._process_single_message(message, recipient, silent)
    recipient.receive([processed], self, request_reply, silent)
elif isinstance(message, dict):
    # Fast path for single dict message
    # ...
else:
    # Slower path for list of messages
    # ... existing logic

Security Concerns

πŸ”’ No Critical Security Issues Identified

However, consider:

  1. Hook injection risk: Hooks can modify messages arbitrarily. Ensure proper validation of hook return values (see recommendation above)
  2. Message content validation: No validation that message content is sanitized before processing

Test Coverage

βœ… Excellent Coverage

  • 638 lines of sync backward compatibility tests
  • 674 lines of async backward compatibility tests
  • Tests cover:
    • String messages, dict messages, list messages
    • Tool calls, groupchats, handoffs
    • Nested chats, patterns, conditions

⚠️ Missing Test Cases

  1. Hook behavior with list messages: No explicit test for how hooks process multi-message lists
  2. Empty message edge cases: What happens when hooks return empty lists?
  3. Invalid message format recovery: Test resilience to malformed messages
  4. Performance regression tests: Ensure single-message case isn't significantly slower

Recommendation:

def test_hook_processes_multiple_messages():
    """Test that hooks receive and can modify each message in a list"""
    def my_hook(sender, message, recipient, silent):
        message["modified"] = True
        return message
    
    agent.register_hook("process_message_before_send", my_hook)
    agent.send([{"content": "msg1"}, {"content": "msg2"}], recipient)
    
    # Verify both messages were processed
    assert all(msg.get("modified") for msg in recipient.chat_messages[agent])

Documentation

βœ… Good Documentation Updates

  • Updated docstrings in send(), receive(), a_send(), a_receive()
  • Updated contributor documentation

⚠️ Missing Documentation

  1. Migration guide: How should users update their code? (even though backward compatible, new code should use best practices)
  2. Hook contract: Document that hooks must handle dict messages (not lists) and must return dicts
  3. Performance implications: Document when to use single vs. multiple messages

Recommendations Summary

High Priority

  1. βœ… Fix _process_message_before_send() return type signature
  2. βœ… Add hook return value validation
  3. βœ… Review role assignment logic ("user" vs "assistant")
  4. βœ… Add tests for hook behavior with list messages

Medium Priority

  1. ⚠️ Consider performance optimization for single-message case
  2. ⚠️ Add empty message handling warning/logging
  3. ⚠️ Document migration best practices

Low Priority

  1. πŸ“ Add performance regression tests
  2. πŸ“ Expand documentation on when to use list messages

Final Assessment

Overall: Strong PR with room for improvement

This PR successfully achieves its goals of standardizing the API while maintaining backward compatibility. The implementation is well-tested and thoroughly documented. However, addressing the type safety concerns and edge cases will make this production-ready.

Recommendation: Request Changes

  • Address type signature inconsistencies
  • Add hook validation
  • Clarify role assignment logic
  • Add missing test cases

Once these are addressed, this will be an excellent contribution to the codebase.


Great work on tackling this complex refactoring! The test coverage is particularly impressive. πŸŽ‰

claude[bot] avatar Oct 24 '25 21:10 claude[bot]

@claude can you please re-review the addressed changes based on previous review :

  1. Type Safety - Fixed _process_message_before_send() return type to list[dict[str, Any]]
  2. Hook Validation - Added isinstance() check, raises TypeError on invalid returns
  3. Role Assignment - Ignored - Working correctly per design, confirmed by tests and shows expected behavior
  4. Hook Tests - Added tests for list message processing
  5. Performance - Deferred - Negligible overhead, can optimize later if needed
  6. Empty Messages - Added warnings.warn() for debugging
  7. Documentation - Docstrings updated, migration guide enhanced
  8. Custom Field Preservation - Hooks can now add metadata (timestamps, IDs, etc.) that persists through message storage Added preserve_custom_fields parameter Local: preserves all fields Remote: strict whitelist for clean serialization Fixes None-field TypeErrors in existing code

guess we can Approve this Pull request.

priyansh4320 avatar Oct 25 '25 11:10 priyansh4320

Claude finished @priyansh4320's task β€”β€” View job


Re-review of PR #2081: Enforce consistent list[messages] API

I've completed a comprehensive re-review of the addressed changes. Here's my detailed assessment:


βœ… Successfully Addressed Issues

1. Type Safety Fix βœ…

Status: RESOLVED

The _process_message_before_send() return type has been correctly fixed:

# Line 1098-1100
def _process_message_before_send(
    self, message: list[dict[str, Any]], recipient: Agent, silent: bool
) -> list[dict[str, Any]]:  # βœ… Correct return type

Previously, the return type was dict[str, Any] | str | list[dict[str, Any]], which was inconsistent with the implementation. This is now fixed.

2. Hook Validation βœ…

Status: RESOLVED

Comprehensive hook validation has been added (lines 1129-1134):

if not isinstance(processed_msg, dict):
    raise TypeError(
        f"Hook must return a dict, got {type(processed_msg).__name__}. "
        f"Hooks should modify and return the message dictionary."
    )

This ensures hooks always return valid dict messages.

3. Empty Message Handling βœ…

Status: RESOLVED

The implementation now includes proper warnings for empty message lists (lines 1194-1200):

if len(processed_msgs) == 0:
    warnings.warn(
        "Attempted to send empty message list - no action taken. "
        "This may indicate a hook removed all messages.",
        UserWarning,
        stacklevel=2,
    )
    return processed_msgs

This provides debugging visibility without breaking execution.

4. Comprehensive Hook Testing βœ…

Status: RESOLVED

Excellent test coverage added with 8+ dedicated hook tests in test_conversable_agent.py:

  • test_hook_processes_multiple_messages() - Verifies hooks process each message in a list
  • test_hook_receives_dict_for_string_input() - Ensures hooks receive normalized dicts
  • test_multiple_hooks_process_list_messages() - Tests multiple hook chaining
  • test_hook_return_validation_with_list() - Tests type validation
  • test_hook_cannot_create_empty_list() - Edge case handling
  • test_hook_modifies_content_to_empty() - Content modification tests
  • test_hook_processes_multiple_messages_async() - Async version
  • test_hook_adds_metadata_to_list_messages() - Metadata preservation

These tests are thorough and cover the expected behavior comprehensively.

5. Documentation Updates βœ…

Status: RESOLVED

Migration guide has been added (website/docs/contributor-guide/Migration-Guide.mdx):

  • Clear explanation of changes
  • Backward compatibility guarantees
  • Best practices for new code
  • Performance considerations
  • Hook author guidelines with clear examples

The documentation is well-structured and provides actionable guidance.

6. Custom Field Preservation βœ…

Status: WELL IMPLEMENTED

The preserve_custom_fields parameter in normilize_message_to_oai() (lines 4437, 4453-4463):

  • Local storage: Preserves all custom fields (timestamps, metadata, etc.)
  • Remote/serialization: Uses strict whitelist for clean API calls
  • None handling: Filters None values except for content to avoid breaking code that checks if key in dict

This is a smart solution that balances flexibility with safety.


⚠️ Areas of Concern

1. Role Assignment Logic - Needs Clarification

Severity: Medium

In send() method (line 1180), messages are normalized with role="user":

message = self._normalize_message_to_list(message, role="user")

However, in _append_oai_message() (line 1205), the role is set to "assistant":

self._append_oai_message(message=msg, role="assistant", conversation_id=recipient, name=self.name)

Question: Is this intentional? The flow is:

  1. Agent A sends message β†’ normalized as "user"
  2. Appended to storage as "assistant"

This seems counter-intuitive. When an agent sends a message, shouldn't it be stored as "assistant" (since it's composing it)? The current behavior may be correct for the OpenAI schema (where messages sent to another agent are from the recipient's perspective), but it warrants documentation/clarification.

Recommendation: Add a comment explaining why the role switches from "user" to "assistant" between normalization and storage.

2. Generate Reply Return Type Inconsistency

Severity: Low

The generate_reply() method (lines 2987-3058) has return type:

def generate_reply(...) -> str | dict[str, Any] | list[dict[str, Any]] | None:

But in receive() (lines 1354-1359), the reply is normalized to list:

if isinstance(reply, str):
    reply = [{"content": reply, "role": "assistant"}]
elif isinstance(reply, dict):
    reply = [reply]
self.send(reply, sender, silent=silent)

This is correct behavior (normalizing before sending), but it means:

  • generate_reply() can return multiple formats
  • But send() always receives normalized lists

This is fine functionally, but creates a slight API inconsistency where internal methods return varying formats while public API is normalized.

Recommendation: Consider documenting this design decision in the docstring.

3. Missing Type Hint Consistency in Protocol

Severity: Low

The protocol in agent.py uses @overload decorators (lines 41-72), which is excellent for type safety. However, the implementation signature accepts unions while the overloads are specific. This is intentional to guide developers toward the canonical format while maintaining runtime flexibility.

Observation: This is a valid design trade-off, but developers using strict type checkers (mypy, pyright) may see warnings when passing str or dict to methods typed as list[dict[str, Any]].

Recommendation: This is acceptable as-is, but consider adding a note in the migration guide about potential type checker warnings.


πŸ” Code Quality Assessment

Strengths:

  1. Excellent test coverage: 1,312 new test lines (638 sync + 674 async backward compatibility tests)
  2. Comprehensive hook validation: Prevents invalid returns with clear error messages
  3. Thoughtful normalization: _normalize_message_to_list() is clean and well-documented
  4. Good documentation: Migration guide provides clear examples and best practices
  5. Smart field preservation: Balances local flexibility with remote API compliance
  6. Consistent error handling: Validation checks in both send() and a_send()

Minor Issues:

  1. Codecov 60% patch coverage: 46 lines missing coverage according to codecov report

    • Most uncovered lines are likely in experimental features (reliable.py shows 0% coverage)
    • Core implementation in conversable_agent.py has 65.21% coverage
    • This is acceptable for experimental code, but core paths should ideally be >80%
  2. Potential performance consideration: Every message goes through normalization β†’ validation β†’ hook processing β†’ validation again. For high-throughput scenarios, this could add overhead. However, the implementation is already optimized (single pass for list processing), so this is likely negligible.


🎯 Final Verdict

Overall Assessment: APPROVED with Minor Recommendations

This PR successfully achieves its goals:

  • βœ… Standardizes API to list[dict[str, Any]] format
  • βœ… Maintains full backward compatibility
  • βœ… Addresses all critical issues from previous reviews
  • βœ… Provides comprehensive test coverage
  • βœ… Includes clear documentation and migration guide

Recommendations (Non-blocking):

  1. Clarify role assignment logic - Add comment explaining why messages are normalized as "user" but stored as "assistant"
  2. Document return type flexibility - Add note in generate_reply() docstring about internal format flexibility vs. public API consistency
  3. Consider type checker guidance - Add migration guide note about potential type checker warnings
  4. Improve experimental code coverage - Add tests for reliable.py to improve overall coverage (though this is not critical for core functionality)

What Makes This PR Strong:

  1. Thorough testing: The 8+ hook tests demonstrate deep understanding of edge cases
  2. Smart design: Custom field preservation with local/remote distinction shows thoughtful API design
  3. Excellent documentation: Migration guide provides actionable guidance for users and hook authors
  4. Defensive programming: Multiple validation layers prevent invalid states
  5. Backward compatibility: Existing code continues to work without modification

Security & Stability:

  • βœ… No security concerns identified
  • βœ… Hook validation prevents malformed messages
  • βœ… CI passing (PR checks: success, tests: passing)
  • βœ… No breaking changes for existing users

Conclusion

This PR represents a significant improvement to the codebase. All critical issues from previous reviews have been addressed comprehensively. The implementation is solid, well-tested, and thoroughly documented.

Recommendation: APPROVE βœ…

The minor recommendations above are nice-to-haves that improve clarity but don't block merging. The PR is production-ready and addresses the standardization goals while maintaining backward compatibility.

Great work on this complex refactoring! πŸŽ‰


View job | Branch: update-generate-reply-function

claude[bot] avatar Oct 25 '25 11:10 claude[bot]

Code Review for PR #2081: Message Format Standardization

Thank you for this comprehensive refactoring! This PR makes a significant improvement to the messaging API by standardizing to a consistent list[dict[str, Any]] format. Here's my detailed review:


🎯 Overall Assessment

Summary: This is a well-executed refactoring with strong backward compatibility, comprehensive testing, and clear documentation. The changes significantly improve code maintainability and set a solid foundation for future enhancements.

Recommendation: Approve with minor suggestions for consideration.


βœ… Strengths

1. Excellent Backward Compatibility

  • Runtime normalization via _normalize_message_to_list() ensures all existing code continues to work
  • 1,312 lines of comprehensive backward compatibility tests (sync + async)
  • Migration guide clearly documents the changes

2. Strong Test Coverage

  • New test files cover 14 different scenarios including:
    • Single agent chat
    • Two agent chat
    • Group chats with managers
    • Handoffs and nested chats
    • Tool calling
    • Various group chat patterns
  • Both sync (test_backward_compatibility_sync.py, 638 lines) and async (test_backward_compatibility_async.py, 674 lines) versions

3. Clean API Design

  • Consistent message shape reduces branching complexity
  • Overloaded type hints guide developers toward canonical format
  • Hook processing is now straightforward (always receives dicts)

4. Good Documentation

  • Comprehensive migration guide (Migration-Guide.mdx)
  • Updated docstrings throughout
  • Clear examples of old vs new patterns

πŸ” Issues & Concerns

1. Critical: Potential Role Override Bug ⚠️

In _normalize_message_to_list(), when converting a dict to list, the function doesn't preserve/validate the existing role:

def _normalize_message_to_list(
    message: str | dict[str, Any] | list[dict[str, Any]], role: str = "assistant"
) -> list[dict[str, Any]]:
    if isinstance(message, str):
        return [{"content": message, "role": role}]
    elif isinstance(message, dict):
        return [message]  # ← Doesn't add role if missing
    return message

Issue: If a dict message doesn't have a role field, it's returned as-is without adding the default role. This could cause validation issues downstream.

Suggested Fix:

elif isinstance(message, dict):
    if "role" not in message:
        return [{**message, "role": role}]
    return [message]

However, I notice that normilize_message_to_oai() handles this later, so this might be intentional. Worth clarifying in the docstring or adding a comment.

2. Inconsistent Role Parameter Usage

The role parameter defaults to "assistant" but is called with role="user" in many places:

# In send():
message = self._normalize_message_to_list(message, role="user")  # line 1180

# In initiate_chat():
self.send(self._normalize_message_to_list(msg2send, role="user"), ...)  # line 1593

This seems backwards - when an agent sends a message, shouldn't the role be "assistant"? The receiving agent would then process it. The default of "assistant" in the function signature conflicts with actual usage.

Impact: This might not cause bugs now due to downstream normalization, but it creates confusion about message ownership and could lead to issues in multimodal or complex message scenarios.

3. Hook Validation Placement

The hook validation happens inside _process_message_before_send() for each message:

if not isinstance(processed_msg, dict):
    raise TypeError(...)  # line 1130-1134

Concern: If a hook corrupts message 5 out of 10, the first 4 messages will be processed successfully before the error occurs. This could lead to inconsistent state.

Suggestion: Consider collecting all processed messages first, then validating in bulk before modifying any state.

4. Performance: Redundant Validation

In send() and a_send(), all messages are validated via `_append_oai_message()":

if not all(
    self._append_oai_message(message=msg, role="assistant", conversation_id=recipient, name=self.name)
    for msg in processed_msgs
):
    raise ValueError(...)  # lines 1204-1210

Issue: This validates AND appends messages to the conversation history in one go using a generator expression. If validation fails on message N, messages 1 through N-1 have already been appended to self._oai_messages[recipient].

Impact: Failed sends leave partial message history, potentially corrupting conversation state.

Suggested Fix: Validate all messages first without side effects, then append only if all pass:

# Validate first
valid = [self._validate_oai_message(msg) for msg in processed_msgs]
if not all(valid):
    raise ValueError(...)

# Then append
for msg in processed_msgs:
    self._append_oai_message(...)

5. Missing warnings Import Check

In lines 1195-1200, warnings.warn() is used but I don't see warnings in the imports at the top of the diff. This might already be imported, but worth verifying to avoid runtime errors.


πŸ”’ Security Considerations

βœ… No significant security concerns identified

  • The normalization doesn't introduce injection vulnerabilities
  • Message validation is preserved
  • No new external inputs or data flows

⚑ Performance Considerations

Positive:

  • Normalization overhead is minimal for single messages
  • Batch sending reduces hook overhead
  • Simplified branching logic should improve overall performance

Areas for Optimization:

  1. The generator expression in validation (mentioned in issue #4) could be replaced with explicit validation
  2. Consider caching normalized messages if the same message is sent to multiple recipients

πŸ“ Code Quality & Best Practices

Good:

  • βœ… Type hints are comprehensive and accurate
  • βœ… Docstrings are detailed and helpful
  • βœ… Error messages are clear and actionable
  • βœ… Code follows repository conventions
  • βœ… Proper use of @overload for type safety

Suggestions:

  1. Line 1100: Comment says "Fixed: was dict[str, Any] | str | list[dict[str, Any]]" - this should be removed before merge (it's a review comment, not production code)

  2. Typo in function name: normilize_message_to_oai should be normalize_message_to_oai (with 'a'). If this is existing code, consider a future refactor.

  3. MAINTAINERS.md: Adding yourself as a maintainer is great! The RFC area is well-deserved given the scope of this PR.


πŸ§ͺ Test Coverage Assessment

Excellent coverage overall! The 14 test scenarios cover:

  • βœ… Single/two-agent chats
  • βœ… Group chats with managers
  • βœ… Tool calling and handoffs
  • βœ… Nested chats
  • βœ… Various patterns (Random, RoundRobin, Manual, etc.)
  • βœ… Both sync and async versions

Missing Test Cases (Nice to Have):

  1. Error handling: What happens when hooks return invalid types?
  2. Edge cases: Empty string messages, None content, malformed dicts
  3. Multi-message batches: Explicitly test sending [msg1, msg2, msg3] in a single call
  4. Hook state corruption: Test that failed hooks don't corrupt conversation state

πŸ“š Documentation Review

Migration Guide (Migration-Guide.mdx):

βœ… Excellent! Clear examples, good structure, helpful for developers

Suggestions:

  1. Add a note about the role parameter behavior in _normalize_message_to_list()
  2. Include a troubleshooting section for common migration issues
  3. Add example of sending batched messages

🎯 Specific File Comments

autogen/agentchat/conversable_agent.py

  • Line 1028: Default role="assistant" but mostly called with role="user" - clarify intent
  • Line 1100: Remove the "Fixed:" comment
  • Line 1205: Validation with side effects (see issue #4)

Test Files

  • Excellent coverage and well-structured
  • Consider adding explicit test for batch message sending
  • Test files properly use clear_all_agent_handoffs() for isolation

autogen/tools/experimental/reliable/reliable.py

  • Changes look good, properly handles normalized format

πŸš€ Recommendation Summary

Overall: This is high-quality work that significantly improves the codebase. The issues identified are mostly minor and don't block approval.

Before Merge:

  1. βœ… Required: Address the validation side-effects issue (#4) - this could cause state corruption
  2. ⚠️ Recommended: Clarify the role parameter behavior (#2)
  3. πŸ“ Nice to have: Remove the "Fixed:" comment (#1 in Code Quality)

Future Work (Can be separate PRs):

  • Fix the normilize β†’ normalize typo
  • Add error handling test cases
  • Consider validation optimization

πŸ“Š Final Score

  • Code Quality: 9/10
  • Test Coverage: 9/10
  • Documentation: 9/10
  • Backward Compatibility: 10/10
  • Architecture: 9/10

Overall: 9.2/10 - Excellent work! πŸŽ‰


Great job on this substantial refactoring! The consistent message format will make future development much easier. Looking forward to seeing this merged.

cc: @priyansh4320

claude[bot] avatar Oct 28 '25 17:10 claude[bot]

Few more comments, @priyansh4320

  1. Why do we need test/test_backward_compatibility_sync.py and test/test_backward_compatibility_sync.py? We need to test the list[message] cases, and backward compatibility should be covered from previous tests. We only add the missing part if not exists.
  2. Remove website/docs/contributor-guide/Migration-Guide.mdx as this is not public doc. Add instruction in user-guide

randombet avatar Oct 31 '25 03:10 randombet

  1. Why do we need test/test_backward_compatibility_sync.py and test/test_backward_compatibility_sync.py? We need to test the list[message] cases, and backward compatibility should be covered from previous tests. We only add the missing part if not exists.
  • both files are robust testing covering each pattern and covering various integrations from 2 agents to groupchats and tool calls as well. benifits:
  • better bug resolution , as well as helps in robust refactoring.

priyansh4320 avatar Oct 31 '25 18:10 priyansh4320

@priyansh4320 Search for "message = messages[-1]". All those lines may need to be updated to include the delta messages instead of the last one. i.e. if we send a list of 3 messages, then those 3 messages needs to be passed over instead of just one.

e.g. check this one https://github.com/ag2ai/ag2/blob/main/autogen/agentchat/groupchat.py#L1204

randombet avatar Oct 31 '25 18:10 randombet

2. Remove website/docs/contributor-guide/Migration-Guide.mdx as this is not public doc. Add instruction in user-guide

this change was suggested by claude , updated the changes

priyansh4320 avatar Oct 31 '25 18:10 priyansh4320

@priyansh4320 Search for "message = messages[-1]". All those lines may need to be updated to include the delta messages instead of the last one. i.e. if we send a list of 3 messages, then those 3 messages needs to be passed over instead of just one.

e.g. check this one https://github.com/ag2ai/ag2/blob/main/autogen/agentchat/groupchat.py#L1204

@priyansh4320 Search for "message = messages[-1]". All those lines may need to be updated to include the delta messages instead of the last one. i.e. if we send a list of 3 messages, then those 3 messages needs to be passed over instead of just one.

e.g. check this one

@randombet thanks for the suggestion, included this change in my PR.

priyansh4320 avatar Oct 31 '25 19:10 priyansh4320

Addressed all the comments and changes:

  • updated group chats to handle delta messages and added test for these in test_groupchats.py
  • all tests are passing
  • Added documentation
  • removed redundant test, and overloads
  • moved helper methods from conversable_agents.py to utils.py and added test in test_agentchat_utils.py

@randombet can you please test this PR and approve if everything is okay?

priyansh4320 avatar Nov 01 '25 00:11 priyansh4320

Codecov Report

:x: Patch coverage is 68.15642% with 57 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
autogen/agentchat/conversable_agent.py 63.33% 15 Missing and 18 partials :warning:
autogen/agentchat/groupchat.py 70.58% 6 Missing and 4 partials :warning:
autogen/tools/experimental/reliable/reliable.py 0.00% 9 Missing :warning:
autogen/agentchat/utils.py 94.73% 1 Missing and 1 partial :warning:
autogen/remote/agent_service.py 60.00% 2 Missing :warning:
autogen/tools/tool.py 50.00% 0 Missing and 1 partial :warning:
Files with missing lines Coverage Ξ”
autogen/agentchat/agent.py 100.00% <ΓΈ> (ΓΈ)
autogen/agentchat/group/multi_agent_chat.py 37.50% <ΓΈ> (ΓΈ)
autogen/agentchat/group/patterns/random.py 100.00% <100.00%> (ΓΈ)
...n/agents/experimental/reasoning/reasoning_agent.py 40.86% <ΓΈ> (ΓΈ)
autogen/tools/tool.py 91.80% <50.00%> (-1.42%) :arrow_down:
autogen/agentchat/utils.py 89.39% <94.73%> (-3.16%) :arrow_down:
autogen/remote/agent_service.py 68.67% <60.00%> (ΓΈ)
autogen/tools/experimental/reliable/reliable.py 18.94% <0.00%> (-0.17%) :arrow_down:
autogen/agentchat/groupchat.py 66.18% <70.58%> (-5.73%) :arrow_down:
autogen/agentchat/conversable_agent.py 68.29% <63.33%> (+0.71%) :arrow_up:

... and 29 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Nov 04 '25 15:11 codecov[bot]

Please add integration tests (not only unit test) for two agent chat, Nested Chat , group chat, group chat with Orchestration Patterns. Cover ConversableAgent::run, initiate_chat, initiate_chats, initiate_group_chat, run_group_chat and their async counterparts.

@priyansh4320 where are the integration tests?

randombet avatar Nov 04 '25 20:11 randombet

Please add integration tests (not only unit test) for two agent chat, Nested Chat , group chat, group chat with Orchestration Patterns. Cover ConversableAgent::run, initiate_chat, initiate_chats, initiate_group_chat, run_group_chat and their async counterparts.

@priyansh4320 where are the integration tests?

test have been marked "@pytest.mark.integration" in these files:

  • test/agentchat/test_conversable_agent.py
  • test/agentchat/test_groupchat.py
  • test/agentchat/test_nested_chats.py
  • test/agentchat/test_tool_call.py
  • test/agentchat/test_chats.py

priyansh4320 avatar Nov 04 '25 20:11 priyansh4320

@randombet I have cross-checked, my integration tests are correct, where real configuration is being used with credentials_gpt_4o_mini.llm_config, credentials_gpt_4o.llm_config,

  • All the integration tests are marked with @pytest.mark.integration :

Here is a list of all the integration tests I wrote; the rest are unit tests: List of integration tests test_groupchat.pyΒ  Synchronous Tests:

  1. test_group_chat_manager_with_multiple_agents_integration
  2. test_group_chat_round_robin_pattern_integration
  3. test_group_chat_auto_pattern_integration
  4. test_run_group_chat_round_robin_integration
  5. test_run_group_chat_auto_pattern_integration Asynchronous Tests:
  6. test_group_chat_manager_with_multiple_agents_async_integration
  7. test_group_chat_round_robin_pattern_async_integration
  8. test_group_chat_auto_pattern_async_integration
  9. test_a_run_group_chat_round_robin_async_integration
  10. test_a_run_group_chat_auto_pattern_async_integration
  11. test_group_chat_random_pattern_async_integration

test_conversable_agent.pyΒ  Synchronous Tests:

  1. test_single_agent_generate_reply_integration
  2. test_two_agent_chat_with_calculator_tool_integration
  3. test_agent_run_method_single_integration
  4. test_agent_run_method_two_agents_integration Asynchronous Tests:
  5. test_single_agent_generate_reply_async_integration
  6. test_two_agent_chat_with_calculator_tool_async_integration
  7. test_agent_a_run_method_single_async_integration
  8. test_agent_a_run_method_two_agents_async_integration

test_chats.pyΒ  Multi-Turn ConversationΒ Tests:

  1. test_complex_multi_turn_conversation_integration
  2. test_complex_multi_turn_conversation_async_integration

test_tool_calls.pyΒ  Tool and Math Integration Tests:

  1. test_eval_math_responses
  2. test_eval_math_responses_api_style_function

Group Chat with ToolsΒ Tests:

  1. test_handoffs_and_tools_integration
  2. test_mixed_tools_scenario_integration

Advanced Nested Chat IntegrationΒ Tests:

  1. test_nested
  2. test_conditional_handoffs_with_llm_conditions_integration
  3. test_nested_chat_target_integration
  4. test_terminate_target_functionality_integration

Async Advanced Nested Chat Integration Tests:

  1. test_conditional_handoffs_with_llm_conditions_async_integration
  2. test_nested_chat_target_async_integration
  3. test_terminate_target_functionality_async_integration

priyansh4320 avatar Nov 05 '25 10:11 priyansh4320

@priyansh4320 Could you separate the tests into separate files? Any code over 500 lines is bloated and we should avoid make it worse. Also separate unit tests and integration tests in one file.

In the integration tests for your case, please use at lease 2 messages in a list as input and verify with meaningful assert e.g. the following statements can't justify testing purpose. assert result is not None assert last_agent is not None

randombet avatar Nov 05 '25 18:11 randombet