instructor icon indicating copy to clipboard operation
instructor copied to clipboard

Support Part.from_uri for arbitrary Google GenAI inputs

Open kylegallatin opened this issue 6 months ago • 5 comments

Is your feature request related to a problem? Please describe.

The latest version of the Google GenAI image/video/document understanding documentation supports Part.from_uri for passing in GCS objects in a number of different formats. From what I understand currently, I can only currently pass GCS objects directly in an implicit manner and only for images with autodetect_images=True, e.g. this example from the tests.

For PDFs I can get around this by using the HTTP version of a given object and using PDF.from_url, e.g. using this:

https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf

instead of this:

gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf

Describe the solution you'd like

Ideally, I'd like to find the best way to be able to pass genAI's Part.from_uri directly through Instructor to the backend. We need to process images, PDFs, and videos all in GCS - and don't want to have to upload the Videos files if already supported by genAI.

Describe alternatives you've considered

Considered (1) creating a separate workflow in our codebase for multimodal Gemini content, or (2) using the HTTP version of these GCS URIs. However, would be nice to use Instructor for everything in our codebase and also not have to re-upload videos.

kylegallatin avatar Jul 02 '25 19:07 kylegallatin

Oh, I thought we actually passed everything over and that autodetect was a utility that converts it back into 'parts'

@claude and you review the code to confirm this, i'll review afterwards

jxnl avatar Jul 03 '25 14:07 jxnl

Claude finished @jxnl's task —— View job


Reviewing GenAI Part.from_uri Support

I've completed my analysis of the current GenAI integration. Here are my findings:

Todo List:

  • [x] Examine GenAI client implementation files
  • [x] Review multimodal content handling code
  • [x] Check the test file mentioned in the issue
  • [x] Analyze autodetect functionality
  • [x] Confirm current Part.from_uri support status
  • [x] Provide detailed findings

Key Findings

The good news: @jxnl is partially correct - we do pass through a lot of content, but there are some important gaps.

Current State of Part.from_uri Support

✅ Limited Part.from_uri support exists:

  • Currently used in PDFWithGenaiFile.to_genai() method (multimodal.py:689-692)
  • Only works for Google GenAI file service URIs (https://generativelanguage.googleapis.com/v1beta/files/...)

❌ Missing Part.from_uri support:

  • GCS URLs (gs://...): Currently converted to Part.from_bytes() after downloading the content (multimodal.py:271-275)
  • Regular Images, Audio, PDF classes: Don't use Part.from_uri at all
  • Arbitrary URIs: No general Part.from_uri pathway

How GCS URLs Currently Work

The test at test_multimodal.py:83 shows GCS URLs do work, but not via Part.from_uri:

  1. Autodetect Flow: autodetect_images=True triggers Image.autodetect_safely() (multimodal.py:843)
  2. GCS Detection: Image.autodetect() detects gs:// URLs (multimodal.py:75-76)
  3. Download & Convert: Image.from_gs_url() downloads content and converts to base64 (multimodal.py:125-145)
  4. GenAI Format: Image.to_genai() calls Part.from_bytes() with downloaded data (multimodal.py:271-275)

The Gap

Issue: We're downloading and re-uploading GCS content instead of using Part.from_uri(), which would:

  • Be more efficient (no download/upload cycle)
  • Support private GCS objects (with proper auth)
  • Work better for large video files
  • Match Google's recommended approach

What Needs to Change

To support Part.from_uri for arbitrary Google GenAI inputs:

  1. Modify Image.to_genai() to use Part.from_uri() for GCS URLs instead of downloading
  2. Extend to Audio and PDF classes with similar logic
  3. Add direct Part.from_uri support for users who want to pass URIs directly
  4. Preserve backward compatibility with current autodetect behavior

The architecture is already there - we just need to change the conversion strategy in the to_genai() methods from Part.from_bytes() to Part.from_uri() for supported URI schemes.

claude[bot] avatar Jul 03 '25 14:07 claude[bot]

Here's a short overview of what works and what doesn't:

Passing PDF URI directly with autodetect_images=True

import instructor
from pydantic import BaseModel
from instructor.multimodal import PDF
from google.genai import Client


class ImageDescription(BaseModel):
    items: list[str]

uri = "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf"
client = instructor.from_genai(Client(vertexai=True))
response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[
        {
            "role": "user",
            "content": [
                "Describe this PDF in detail",
                uri,
            ],  # type: ignore
        }
    ],
    autodetect_images=True,
    response_model=ImageDescription,
)

Seems to pass it as a string without converting, as the response is:

ImageDescription(items=['gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf'])

Pass the PDF URI through with PDF.from_url

import instructor
from pydantic import BaseModel
from instructor.multimodal import PDF
from google.genai import Client


class ImageDescription(BaseModel):
    items: list[str]

uri = "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf"
client = instructor.from_genai(Client(vertexai=True))
response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[
        {
            "role": "user",
            "content": [
                "Describe this PDF in detail",
                PDF.from_url(uri),
            ],  # type: ignore
        }
    ],
    autodetect_images=True,
    response_model=ImageDescription,
)

Fails with

...
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/process_response.py:669, in handle_genai_tools(response_model, new_kwargs)
    657     system_message = None
    659 new_kwargs["config"] = types.GenerateContentConfig(
    660     system_instruction=system_message,
    661     tools=[types.Tool(function_declarations=[function_definition])],
   (...)
    666     ),
    667 )
--> 669 new_kwargs["contents"] = convert_to_genai_messages(new_kwargs["messages"])
    671 new_kwargs.pop("response_model", None)
    672 new_kwargs.pop("messages", None)

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/utils.py:1001, in convert_to_genai_messages(messages)
    999     content_parts.append(types.Part.from_text(text=content_item))
   1000 elif isinstance(content_item, (Image, Audio, PDF)):
-> 1001     content_parts.append(content_item.to_genai())
   1002 else:
   1003     raise ValueError(
   1004         f"Unsupported content item type: {type(content_item)}"
   1005     )

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/multimodal.py:601, in PDF.to_genai(self)
    595 if self.data:
    596     return types.Part.from_bytes(
    597         data=base64.b64decode(self.data),
    598         mime_type=self.media_type,
    599     )
--> 601 raise ValueError("Unsupported PDF format")

ValueError: Unsupported PDF format

Convert URI to URL for use with PDF.from_urlworks (somewhat obviously, as this is like any other HTTP URL) ✅

import instructor
from pydantic import BaseModel
from instructor.multimodal import PDF
from google.genai import Client


class ImageDescription(BaseModel):
    items: list[str]

url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf"
client = instructor.from_genai(Client(vertexai=True))
response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[
        {
            "role": "user",
            "content": [
                "Describe this PDF in detail",
                PDF.from_url(url),
            ],  # type: ignore
        }
    ],
    autodetect_images=True,
    response_model=ImageDescription,
)

Convert URI image with Image.from_url

import instructor
from pydantic import BaseModel
from instructor.multimodal import Image
from google.genai import Client


class ImageDescription(BaseModel):
    items: list[str]

uri = "gs://cloud-samples-data/generative-ai/image/cricket.jpeg"
client = instructor.from_genai(Client(vertexai=True))
response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[
        {
            "role": "user",
            "content": [
                "Describe this PDF in detail",
                Image.from_url(uri),
            ],  # type: ignore
        }
    ],
    autodetect_images=True,
    response_model=ImageDescription,
)

Fails with

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/retry.py:168, in retry_sync(func, response_model, args, kwargs, context, max_retries, strict, mode, hooks)
    167 hooks.emit_completion_arguments(*args, **kwargs)
--> 168 response = func(*args, **kwargs)
    169 hooks.emit_completion_response(response)

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/client_genai.py:66, in from_genai.<locals>.sync_wrapper(*args, **kwargs)
     64     return client.models.generate_content_stream(*args, **kwargs)  # type:ignore
---> 66 return client.models.generate_content(*args, **kwargs)

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/google/genai/models.py:6058, in Models.generate_content(self, model, contents, config)
   6057 i += 1
-> 6058 response = self._generate_content(
   6059     model=model, contents=contents, config=parsed_config
   6060 )
   6061 logger.info(f'AFC remote call {i} is done.')

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/google/genai/models.py:5007, in Models._generate_content(self, model, contents, config)
   5005 request_dict = _common.encode_unserializable_types(request_dict)
-> 5007 response_dict = self._api_client.request(
   5008     'post', path, request_dict, http_options
   5009 )
   5011 if self._api_client.vertexai:

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/google/genai/_api_client.py:927, in BaseApiClient.request(self, http_method, path, request_dict, http_options)
    924 http_request = self._build_request(
    925     http_method, path, request_dict, http_options
    926 )
--> 927 response = self._request(http_request, stream=False)
    928 json_response = response.json

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/google/genai/_api_client.py:793, in BaseApiClient._request(self, http_request, stream)
    786 response = self._httpx_client.request(
    787     method=http_request.method,
    788     url=http_request.url,
   (...)
    791     timeout=http_request.timeout,
    792 )
--> 793 errors.APIError.raise_for_response(response)
    794 return HttpResponse(
    795     response.headers, response if stream else [response.text]
    796 )

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/google/genai/errors.py:104, in APIError.raise_for_response(cls, response)
    103 if 400 <= status_code < 500:
--> 104   raise ClientError(status_code, response_json, response)
    105 elif 500 <= status_code < 600:

ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Provided image is not valid.', 'status': 'INVALID_ARGUMENT'}}

The above exception was the direct cause of the following exception:

RetryError                                Traceback (most recent call last)
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/retry.py:163, in retry_sync(func, response_model, args, kwargs, context, max_retries, strict, mode, hooks)
    162 response = None
--> 163 for attempt in max_retries:
    164     with attempt:

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/tenacity/__init__.py:443, in BaseRetrying.__iter__(self)
    442 while True:
--> 443     do = self.iter(retry_state=retry_state)
    444     if isinstance(do, DoAttempt):

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/tenacity/__init__.py:376, in BaseRetrying.iter(self, retry_state)
    375 for action in self.iter_state.actions:
--> 376     result = action(retry_state)
    377 return result

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/tenacity/__init__.py:419, in BaseRetrying._post_stop_check_actions.<locals>.exc_check(rs)
    418     raise retry_exc.reraise()
--> 419 raise retry_exc from fut.exception()

RetryError: RetryError[<Future at 0x16c704ad0 state=finished raised ClientError>]

The above exception was the direct cause of the following exception:

InstructorRetryException                  Traceback (most recent call last)
Cell In[9], line 12
     10 uri = "gs://cloud-samples-data/generative-ai/image/cricket.jpeg"
     11 client = instructor.from_genai(Client(vertexai=True))
---> 12 response = client.chat.completions.create(
     13     model="gemini-2.5-pro",
     14     messages=[
     15         {
     16             "role": "user",
     17             "content": [
     18                 "Describe this PDF in detail",
     19                 Image.from_url(uri),
     20             ],  # type: ignore
     21         }
     22     ],
     23     autodetect_images=True,
     24     response_model=ImageDescription,
     25 )

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/client.py:366, in Instructor.create(self, response_model, messages, max_retries, validation_context, context, strict, **kwargs)
    354 def create(
    355     self,
    356     response_model: type[T] | None,
   (...)
    362     **kwargs: Any,
    363 ) -> T | Any | Awaitable[T] | Awaitable[Any]:
    364     kwargs = self.handle_kwargs(kwargs)
--> 366     return self.create_fn(
    367         response_model=response_model,
    368         messages=messages,
    369         max_retries=max_retries,
    370         validation_context=validation_context,
    371         context=context,
    372         strict=strict,
    373         hooks=self.hooks,
    374         **kwargs,
    375     )

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/patch.py:193, in patch.<locals>.new_create_sync(response_model, validation_context, context, max_retries, strict, hooks, *args, **kwargs)
    187 response_model, new_kwargs = handle_response_model(
    188     response_model=response_model, mode=mode, **kwargs
    189 )  # type: ignore
    191 new_kwargs = handle_templating(new_kwargs, mode=mode, context=context)
--> 193 response = retry_sync(
    194     func=func,  # type: ignore
    195     response_model=response_model,
    196     context=context,
    197     max_retries=max_retries,
    198     args=args,
    199     hooks=hooks,
    200     strict=strict,
    201     kwargs=new_kwargs,
    202     mode=mode,
    203 )
    204 return response

File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/retry.py:194, in retry_sync(func, response_model, args, kwargs, context, max_retries, strict, mode, hooks)
    192 except RetryError as e:
    193     logger.debug(f"Retry error: {e}")
--> 194     raise InstructorRetryException(
    195         e.last_attempt._exception,
    196         last_completion=response,
    197         n_attempts=attempt.retry_state.attempt_number,
    198         #! deprecate messages soon
    199         messages=extract_messages(
    200             kwargs
    201         ),  # Use the optimized function instead of nested lookups
    202         create_kwargs=kwargs,
    203         total_usage=total_usage,
    204     ) from e

InstructorRetryException: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Provided image is not valid.', 'status': 'INVALID_ARGUMENT'}}

Pass image URI string in directly with autodetect_images=True

import instructor
from pydantic import BaseModel
from instructor.multimodal import Image
from google.genai import Client


class ImageDescription(BaseModel):
    items: list[str]

uri = "gs://cloud-samples-data/generative-ai/image/cricket.jpeg"
client = instructor.from_genai(Client(vertexai=True))
response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[
        {
            "role": "user",
            "content": [
                "Describe this PDF in detail",
                uri,
            ],  # type: ignore
        }
    ],
    autodetect_images=True,
    response_model=ImageDescription,
)

kylegallatin avatar Jul 07 '25 13:07 kylegallatin

mind making a pr? lets try to get from_provider("google/") working

jxnl avatar Jul 11 '25 21:07 jxnl

Sorry for the delay @jxnl - what was the approach you're thinking? Happy to try and pick this up

kylegallatin avatar Jul 29 '25 20:07 kylegallatin