adk-python icon indicating copy to clipboard operation
adk-python copied to clipboard

fix(models): Add PDF support for Claude models

Open sarojrout opened this issue 1 month ago • 5 comments

  • Add _is_pdf_part() helper function to detect PDF parts
  • Add PDF handling in part_to_message_block() function
  • PDFs are encoded as base64 and sent as document blocks to Anthropic API
  • Update return type annotation to include dict for PDF document blocks
  • Add test for PDF support

Fixes #3614

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

  • Closes: #3614

2. Or, if no issue exists, describe the change:

If applicable, please follow the issue templates to provide as much detail as possible.

Problem: When using Claude models (e.g., Claude Sonnet 4.5) in ADK with PDF files, the code throws a NotImplementedError: Not supported yet error. The part_to_message_block() function in anthropic_llm.py handles text, images, function calls, and function responses, but does not handle PDF documents. When a user attempts to upload a PDF file (with mime_type="application/pdf"), the function falls through to the final NotImplementedError at line 150.

Solution: Added PDF support by:

  1. Creating a _is_pdf_part() helper function (similar to _is_image_part()) to detect PDF parts by checking for mime_type == "application/pdf"
  2. Adding PDF handling in part_to_message_block() function that:
    • Detects PDF parts using the new helper function
    • Encodes PDF data as base64 (same as images)
    • Returns a document block dictionary with type="document" and the base64-encoded PDF data
  3. Updated the return type annotation to include dict[str, Any] for PDF document blocks
  4. Added comprehensive unit test to verify PDF handling works correctly

This solution follows the same pattern used for image handling and leverages Anthropic's API support for PDF documents as document blocks.

Testing Plan

Please describe the tests that you ran to verify your changes. This is required for all PRs that are not small documentation or typo fixes.

Unit Tests:

  • [x] I have added or updated unit tests for my change.
  • [x] All unit tests pass locally.

Please include a summary of passed pytest results.

Manual End-to-End (E2E) Tests:

Please provide instructions on how to manually test your changes, including any necessary setup or configuration. Please provide logs or screenshots to help reviewers better understand the fix. Setup:

  1. Configure Claude model with proper Vertex AI credentials
  2. Set environment variables: GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION

Test Steps:

  1. Create an ADK agent using Claude model:

    from google.adk import Agent
    from google.adk.models.anthropic_llm import Claude
    
    agent = Agent(
        name="pdf_reader",
        model=Claude(model="claude-3-5-sonnet-v2@20241022"),
        instruction="Analyze PDF documents"
    )
    
  2. Upload a PDF file to the agent:

    from google.genai import types
    
    # Read PDF file
    with open("document.pdf", "rb") as f:
        pdf_data = f.read()
    
    # Create content with PDF
    content = types.Content(
        role="user",
        parts=[
            types.Part(
                inline_data=types.Blob(
                    mime_type="application/pdf",
                    data=pdf_data
                )
            )
        ]
    )
    
    # Run agent - should now work without NotImplementedError
    async for event in runner.run_async(
        user_id="test-user",
        session_id="test-session",
        new_message=content
    ):
        print(event)
    

Expected Result:

  • No NotImplementedError is raised
  • PDF is successfully sent to Claude API as a document block
  • Claude can process and analyze the PDF content
  • Agent responds with analysis of the PDF

Checklist

  • [x] I have read the CONTRIBUTING.md document.
  • [x] I have performed a self-review of my own code.
  • [x] I have commented my code, particularly in hard-to-understand areas.
  • [x] I have added tests that prove my fix is effective or that my feature works.
  • [x] New and existing unit tests pass locally with my changes.
  • [ ] I have manually tested my changes end-to-end. (Note: Manual testing requires Vertex AI setup with Claude access)
  • [x] Any dependent changes have been merged and published in downstream modules.

Additional context

Code Changes Summary:

  • File: src/google/adk/models/anthropic_llm.py
  • Added _is_pdf_part() helper function (lines 79-85)
  • Added PDF handling in part_to_message_block() (lines 147-159)
  • Updated return type annotation (line 95)
  • File: tests/unittests/models/test_anthropic_llm.py
  • Added test_part_to_message_block_with_pdf() test (lines 467-496)

Technical Details:

  • PDFs are handled similarly to images: base64-encoded and sent as document blocks
  • The implementation follows Anthropic's API specification for document blocks
  • The fix is backward compatible - existing functionality (text, images, function calls) remains unchanged

sarojrout avatar Nov 23 '25 08:11 sarojrout

Summary of Changes

Hello @sarojrout, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a NotImplementedError that previously occurred when attempting to use PDF files with Claude models within the ADK framework. It introduces robust support for PDF documents, allowing them to be seamlessly processed by Anthropic's API. This enhancement significantly expands the capabilities of ADK agents by enabling them to analyze and interact with PDF content, mirroring the existing functionality for image handling.

Highlights

  • PDF Part Detection: Introduced a new helper function _is_pdf_part() to identify PDF data within message parts by checking for the application/pdf mime type.
  • PDF Handling in Message Blocks: Implemented logic within part_to_message_block() to process PDF parts by base64 encoding their data and formatting them as Anthropic API document blocks, similar to how images are handled.
  • Type Annotation Update: Updated the return type annotation for part_to_message_block() to include dict[str, Any] to correctly reflect the new dictionary type used for PDF document blocks.
  • Comprehensive Unit Test: Added a dedicated unit test (test_part_to_message_block_with_pdf()) to verify the correct handling and base64 encoding of PDF documents.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot] avatar Nov 23 '25 08:11 gemini-code-assist[bot]

can we review and merge this if looks good @ryanaiagent ?

sarojrout avatar Nov 27 '25 02:11 sarojrout

This is amazing to see in-person how coding at a high-level is done correctly!

jesse-aluiso avatar Nov 27 '25 06:11 jesse-aluiso

Hi @sarojrout , Thank you for your contribution through this pull request! This PR has merge conflicts that require changes from your end. Could you please rebase your branch with the latest main branch to address these? Once this is complete, please let us know so we can proceed with the review.

ryanaiagent avatar Nov 30 '25 04:11 ryanaiagent

@ryanaiagent , pls review again. thanks!

sarojrout avatar Nov 30 '25 06:11 sarojrout