feat: Support nested multimodal content in ChatHistory (Issue #141)
Summary
This PR implements support for nested multimodal content in ChatHistory, fixing Issue #141.
The Problem: Previously, ChatHistory.get_history() only detected multimodal content (Image, PDF, Audio from the instructor library) at the top level of input schemas. When multimodal content was nested within other schemas (e.g., a list of documents each containing a PDF), it was incorrectly serialized with json.dumps, causing API errors.
The Solution: Add recursive detection and extraction of multimodal content at any nesting depth:
_contains_multimodal(obj)- Recursively checks if an object contains any multimodal content_extract_multimodal_objects(obj)- Recursively extracts all multimodal objects from nested structures_build_non_multimodal_dict(obj)- Builds a JSON-serializable dict excluding multimodal content
Changes
- atomic-agents/context/chat_history.py: Add 3 recursive helper functions and update
get_history()to use them - atomic-agents/tests/context/test_chat_history.py: Add 6 new tests for nested multimodal scenarios
- atomic-examples/nested-multimodal/: New example demonstrating the fix with nested
ImageDocumentschemas - Fix deprecated
instructor.multimodalimports →instructor.processing.multimodal
Test Plan
- [x] All 30 unit tests pass
- [x] End-to-end validation with OpenAI GPT-4.1 using nested Image content
- [x] Example successfully processes nested
List[ImageDocument]withImagefields
Closes #141
Automated review by Greptile
Greptile Overview
Greptile Summary
This PR successfully implements nested multimodal content support in ChatHistory, solving Issue #141 where multimodal objects (Image, PDF, Audio) nested within schemas were incorrectly serialized.
Key Changes
- Added
_extract_multimodal_content()function with recursive extraction logic and circular reference protection using_seenset tracking - Refactored
get_history()from field-by-field inspection to single-pass recursive extraction - Fixed deprecated
instructor.multimodalimports →instructor.processing.multimodal - Added 6 comprehensive unit tests covering nested lists, dicts, deeply nested schemas, and edge cases
- Includes working example demonstrating
List[ImageDocument]with nestedImagefields
Implementation Quality
The refactored approach is cleaner and more maintainable than the previous top-level-only implementation. The circular reference protection addresses the previously identified concern about infinite recursion.
Confidence Score: 4/5
- This PR is safe to merge with good test coverage and circular reference protection
- Score reflects solid implementation with comprehensive testing. Reduced one point due to the complexity of recursive traversal logic which could benefit from additional edge case validation in production use
- Pay close attention to
atomic-agents/atomic_agents/context/chat_history.py- the recursive extraction logic is complex and handles multiple object types
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| atomic-agents/atomic_agents/context/chat_history.py | 4/5 | Added recursive multimodal extraction with circular reference protection; refactored get_history() to use single-pass extraction; updated imports from deprecated instructor.multimodal |
| atomic-agents/tests/context/test_chat_history.py | 5/5 | Added 6 comprehensive tests for nested multimodal content; updated deprecated imports; all tests verify correct extraction and JSON serialization |
| atomic-examples/nested-multimodal/nested_multimodal/main.py | 5/5 | New example demonstrating nested multimodal content with ImageDocument schema; supports both OpenAI and Gemini; includes custom .env loader |
Sequence Diagram
sequenceDiagram
participant User
participant ChatHistory
participant Extract as _extract_multimodal_content
participant Message
participant Instructor
User->>ChatHistory: add_message(role, content)
ChatHistory->>Message: Create Message with nested multimodal content
Note over Message: content contains ImageDocument<br/>with nested Image objects
User->>ChatHistory: get_history()
ChatHistory->>Extract: _extract_multimodal_content(message.content)
Extract->>Extract: Check if BaseModel (ImageDocument)
Extract->>Extract: Add to _seen set (circular ref protection)
loop For each field in BaseModel
Extract->>Extract: _extract_multimodal_content(field_value)
alt Field is Image/Audio/PDF
Extract-->>Extract: Return MultimodalContent(objects=[obj], json_data=None)
else Field is string/primitive
Extract-->>Extract: Return MultimodalContent(objects=[], json_data=value)
end
Extract->>Extract: Accumulate objects and json_data
end
Extract-->>ChatHistory: MultimodalContent(objects=[Image, ...], json_data={owner, category, ...})
ChatHistory->>ChatHistory: Build content array
Note over ChatHistory: [json.dumps(json_data), Image1, Image2, ...]
ChatHistory-->>Instructor: Return history with separated JSON and multimodal
Note over Instructor: Instructor can now properly<br/>handle multimodal objects