feat: Support nested multimodal content in ChatHistory (Issue #141)

Open KennyVaneetvelde opened this issue 1 month ago • 1 comments

Summary

This PR implements support for nested multimodal content in ChatHistory, fixing Issue #141.

The Problem: Previously, ChatHistory.get_history() only detected multimodal content (Image, PDF, Audio from the instructor library) at the top level of input schemas. When multimodal content was nested within other schemas (e.g., a list of documents each containing a PDF), it was incorrectly serialized with json.dumps, causing API errors.

The Solution: Add recursive detection and extraction of multimodal content at any nesting depth:

_contains_multimodal(obj) - Recursively checks if an object contains any multimodal content
_extract_multimodal_objects(obj) - Recursively extracts all multimodal objects from nested structures
_build_non_multimodal_dict(obj) - Builds a JSON-serializable dict excluding multimodal content

Changes

atomic-agents/context/chat_history.py: Add 3 recursive helper functions and update get_history() to use them
atomic-agents/tests/context/test_chat_history.py: Add 6 new tests for nested multimodal scenarios
atomic-examples/nested-multimodal/: New example demonstrating the fix with nested ImageDocument schemas
Fix deprecated instructor.multimodal imports → instructor.processing.multimodal

Test Plan

[x] All 30 unit tests pass
[x] End-to-end validation with OpenAI GPT-4.1 using nested Image content
[x] Example successfully processes nested List[ImageDocument] with Image fields

Closes #141

Nov 25 '25 20:11 KennyVaneetvelde

Automated review by Greptile

Greptile Overview

Greptile Summary

This PR successfully implements nested multimodal content support in ChatHistory, solving Issue #141 where multimodal objects (Image, PDF, Audio) nested within schemas were incorrectly serialized.

Key Changes

Added _extract_multimodal_content() function with recursive extraction logic and circular reference protection using _seen set tracking
Refactored get_history() from field-by-field inspection to single-pass recursive extraction
Fixed deprecated instructor.multimodal imports → instructor.processing.multimodal
Added 6 comprehensive unit tests covering nested lists, dicts, deeply nested schemas, and edge cases
Includes working example demonstrating List[ImageDocument] with nested Image fields

Implementation Quality

The refactored approach is cleaner and more maintainable than the previous top-level-only implementation. The circular reference protection addresses the previously identified concern about infinite recursion.

Confidence Score: 4/5

This PR is safe to merge with good test coverage and circular reference protection
Score reflects solid implementation with comprehensive testing. Reduced one point due to the complexity of recursive traversal logic which could benefit from additional edge case validation in production use
Pay close attention to atomic-agents/atomic_agents/context/chat_history.py - the recursive extraction logic is complex and handles multiple object types

Important Files Changed

File Analysis

Filename	Score	Overview
atomic-agents/atomic_agents/context/chat_history.py	4/5	Added recursive multimodal extraction with circular reference protection; refactored get_history() to use single-pass extraction; updated imports from deprecated instructor.multimodal
atomic-agents/tests/context/test_chat_history.py	5/5	Added 6 comprehensive tests for nested multimodal content; updated deprecated imports; all tests verify correct extraction and JSON serialization
atomic-examples/nested-multimodal/nested_multimodal/main.py	5/5	New example demonstrating nested multimodal content with ImageDocument schema; supports both OpenAI and Gemini; includes custom .env loader

Sequence Diagram

sequenceDiagram
    participant User
    participant ChatHistory
    participant Extract as _extract_multimodal_content
    participant Message
    participant Instructor

    User->>ChatHistory: add_message(role, content)
    ChatHistory->>Message: Create Message with nested multimodal content
    Note over Message: content contains ImageDocument<br/>with nested Image objects

    User->>ChatHistory: get_history()
    ChatHistory->>Extract: _extract_multimodal_content(message.content)
    
    Extract->>Extract: Check if BaseModel (ImageDocument)
    Extract->>Extract: Add to _seen set (circular ref protection)
    
    loop For each field in BaseModel
        Extract->>Extract: _extract_multimodal_content(field_value)
        alt Field is Image/Audio/PDF
            Extract-->>Extract: Return MultimodalContent(objects=[obj], json_data=None)
        else Field is string/primitive
            Extract-->>Extract: Return MultimodalContent(objects=[], json_data=value)
        end
        Extract->>Extract: Accumulate objects and json_data
    end
    
    Extract-->>ChatHistory: MultimodalContent(objects=[Image, ...], json_data={owner, category, ...})
    
    ChatHistory->>ChatHistory: Build content array
    Note over ChatHistory: [json.dumps(json_data), Image1, Image2, ...]
    
    ChatHistory-->>Instructor: Return history with separated JSON and multimodal
    Note over Instructor: Instructor can now properly<br/>handle multimodal objects

Nov 25 '25 21:11 greptile-apps[bot]