atomic-agents icon indicating copy to clipboard operation
atomic-agents copied to clipboard

feat: Support nested multimodal content in ChatHistory (Issue #141)

Open KennyVaneetvelde opened this issue 1 month ago • 1 comments

Summary

This PR implements support for nested multimodal content in ChatHistory, fixing Issue #141.

The Problem: Previously, ChatHistory.get_history() only detected multimodal content (Image, PDF, Audio from the instructor library) at the top level of input schemas. When multimodal content was nested within other schemas (e.g., a list of documents each containing a PDF), it was incorrectly serialized with json.dumps, causing API errors.

The Solution: Add recursive detection and extraction of multimodal content at any nesting depth:

  • _contains_multimodal(obj) - Recursively checks if an object contains any multimodal content
  • _extract_multimodal_objects(obj) - Recursively extracts all multimodal objects from nested structures
  • _build_non_multimodal_dict(obj) - Builds a JSON-serializable dict excluding multimodal content

Changes

  • atomic-agents/context/chat_history.py: Add 3 recursive helper functions and update get_history() to use them
  • atomic-agents/tests/context/test_chat_history.py: Add 6 new tests for nested multimodal scenarios
  • atomic-examples/nested-multimodal/: New example demonstrating the fix with nested ImageDocument schemas
  • Fix deprecated instructor.multimodal imports → instructor.processing.multimodal

Test Plan

  • [x] All 30 unit tests pass
  • [x] End-to-end validation with OpenAI GPT-4.1 using nested Image content
  • [x] Example successfully processes nested List[ImageDocument] with Image fields

Closes #141

KennyVaneetvelde avatar Nov 25 '25 20:11 KennyVaneetvelde

Automated review by Greptile

Greptile Overview

Greptile Summary

This PR successfully implements nested multimodal content support in ChatHistory, solving Issue #141 where multimodal objects (Image, PDF, Audio) nested within schemas were incorrectly serialized.

Key Changes

  • Added _extract_multimodal_content() function with recursive extraction logic and circular reference protection using _seen set tracking
  • Refactored get_history() from field-by-field inspection to single-pass recursive extraction
  • Fixed deprecated instructor.multimodal imports → instructor.processing.multimodal
  • Added 6 comprehensive unit tests covering nested lists, dicts, deeply nested schemas, and edge cases
  • Includes working example demonstrating List[ImageDocument] with nested Image fields

Implementation Quality

The refactored approach is cleaner and more maintainable than the previous top-level-only implementation. The circular reference protection addresses the previously identified concern about infinite recursion.

Confidence Score: 4/5

  • This PR is safe to merge with good test coverage and circular reference protection
  • Score reflects solid implementation with comprehensive testing. Reduced one point due to the complexity of recursive traversal logic which could benefit from additional edge case validation in production use
  • Pay close attention to atomic-agents/atomic_agents/context/chat_history.py - the recursive extraction logic is complex and handles multiple object types

Important Files Changed

File Analysis

Filename Score Overview
atomic-agents/atomic_agents/context/chat_history.py 4/5 Added recursive multimodal extraction with circular reference protection; refactored get_history() to use single-pass extraction; updated imports from deprecated instructor.multimodal
atomic-agents/tests/context/test_chat_history.py 5/5 Added 6 comprehensive tests for nested multimodal content; updated deprecated imports; all tests verify correct extraction and JSON serialization
atomic-examples/nested-multimodal/nested_multimodal/main.py 5/5 New example demonstrating nested multimodal content with ImageDocument schema; supports both OpenAI and Gemini; includes custom .env loader

Sequence Diagram

sequenceDiagram
    participant User
    participant ChatHistory
    participant Extract as _extract_multimodal_content
    participant Message
    participant Instructor

    User->>ChatHistory: add_message(role, content)
    ChatHistory->>Message: Create Message with nested multimodal content
    Note over Message: content contains ImageDocument<br/>with nested Image objects

    User->>ChatHistory: get_history()
    ChatHistory->>Extract: _extract_multimodal_content(message.content)
    
    Extract->>Extract: Check if BaseModel (ImageDocument)
    Extract->>Extract: Add to _seen set (circular ref protection)
    
    loop For each field in BaseModel
        Extract->>Extract: _extract_multimodal_content(field_value)
        alt Field is Image/Audio/PDF
            Extract-->>Extract: Return MultimodalContent(objects=[obj], json_data=None)
        else Field is string/primitive
            Extract-->>Extract: Return MultimodalContent(objects=[], json_data=value)
        end
        Extract->>Extract: Accumulate objects and json_data
    end
    
    Extract-->>ChatHistory: MultimodalContent(objects=[Image, ...], json_data={owner, category, ...})
    
    ChatHistory->>ChatHistory: Build content array
    Note over ChatHistory: [json.dumps(json_data), Image1, Image2, ...]
    
    ChatHistory-->>Instructor: Return history with separated JSON and multimodal
    Note over Instructor: Instructor can now properly<br/>handle multimodal objects

greptile-apps[bot] avatar Nov 25 '25 21:11 greptile-apps[bot]