Support Part.from_uri for arbitrary Google GenAI inputs
Is your feature request related to a problem? Please describe.
The latest version of the Google GenAI image/video/document understanding documentation supports Part.from_uri for passing in GCS objects in a number of different formats. From what I understand currently, I can only currently pass GCS objects directly in an implicit manner and only for images with autodetect_images=True, e.g. this example from the tests.
For PDFs I can get around this by using the HTTP version of a given object and using PDF.from_url, e.g. using this:
https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf
instead of this:
gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf
Describe the solution you'd like
Ideally, I'd like to find the best way to be able to pass genAI's Part.from_uri directly through Instructor to the backend. We need to process images, PDFs, and videos all in GCS - and don't want to have to upload the Videos files if already supported by genAI.
Describe alternatives you've considered
Considered (1) creating a separate workflow in our codebase for multimodal Gemini content, or (2) using the HTTP version of these GCS URIs. However, would be nice to use Instructor for everything in our codebase and also not have to re-upload videos.
Oh, I thought we actually passed everything over and that autodetect was a utility that converts it back into 'parts'
@claude and you review the code to confirm this, i'll review afterwards
Claude finished @jxnl's task —— View job
Reviewing GenAI Part.from_uri Support
I've completed my analysis of the current GenAI integration. Here are my findings:
Todo List:
- [x] Examine GenAI client implementation files
- [x] Review multimodal content handling code
- [x] Check the test file mentioned in the issue
- [x] Analyze autodetect functionality
- [x] Confirm current Part.from_uri support status
- [x] Provide detailed findings
Key Findings
The good news: @jxnl is partially correct - we do pass through a lot of content, but there are some important gaps.
Current State of Part.from_uri Support
✅ Limited Part.from_uri support exists:
- Currently used in
PDFWithGenaiFile.to_genai()method (multimodal.py:689-692) - Only works for Google GenAI file service URIs (
https://generativelanguage.googleapis.com/v1beta/files/...)
❌ Missing Part.from_uri support:
- GCS URLs (
gs://...): Currently converted toPart.from_bytes()after downloading the content (multimodal.py:271-275) - Regular Images, Audio, PDF classes: Don't use
Part.from_uriat all - Arbitrary URIs: No general
Part.from_uripathway
How GCS URLs Currently Work
The test at test_multimodal.py:83 shows GCS URLs do work, but not via Part.from_uri:
- Autodetect Flow:
autodetect_images=TruetriggersImage.autodetect_safely()(multimodal.py:843) - GCS Detection:
Image.autodetect()detectsgs://URLs (multimodal.py:75-76) - Download & Convert:
Image.from_gs_url()downloads content and converts to base64 (multimodal.py:125-145) - GenAI Format:
Image.to_genai()callsPart.from_bytes()with downloaded data (multimodal.py:271-275)
The Gap
Issue: We're downloading and re-uploading GCS content instead of using Part.from_uri(), which would:
- Be more efficient (no download/upload cycle)
- Support private GCS objects (with proper auth)
- Work better for large video files
- Match Google's recommended approach
What Needs to Change
To support Part.from_uri for arbitrary Google GenAI inputs:
- Modify
Image.to_genai()to usePart.from_uri()for GCS URLs instead of downloading - Extend to
AudioandPDFclasses with similar logic - Add direct
Part.from_urisupport for users who want to pass URIs directly - Preserve backward compatibility with current autodetect behavior
The architecture is already there - we just need to change the conversion strategy in the to_genai() methods from Part.from_bytes() to Part.from_uri() for supported URI schemes.
Here's a short overview of what works and what doesn't:
Passing PDF URI directly with autodetect_images=True ❌
import instructor
from pydantic import BaseModel
from instructor.multimodal import PDF
from google.genai import Client
class ImageDescription(BaseModel):
items: list[str]
uri = "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf"
client = instructor.from_genai(Client(vertexai=True))
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{
"role": "user",
"content": [
"Describe this PDF in detail",
uri,
], # type: ignore
}
],
autodetect_images=True,
response_model=ImageDescription,
)
Seems to pass it as a string without converting, as the response is:
ImageDescription(items=['gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf'])
Pass the PDF URI through with PDF.from_url ❌
import instructor
from pydantic import BaseModel
from instructor.multimodal import PDF
from google.genai import Client
class ImageDescription(BaseModel):
items: list[str]
uri = "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf"
client = instructor.from_genai(Client(vertexai=True))
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{
"role": "user",
"content": [
"Describe this PDF in detail",
PDF.from_url(uri),
], # type: ignore
}
],
autodetect_images=True,
response_model=ImageDescription,
)
Fails with
...
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/process_response.py:669, in handle_genai_tools(response_model, new_kwargs)
657 system_message = None
659 new_kwargs["config"] = types.GenerateContentConfig(
660 system_instruction=system_message,
661 tools=[types.Tool(function_declarations=[function_definition])],
(...)
666 ),
667 )
--> 669 new_kwargs["contents"] = convert_to_genai_messages(new_kwargs["messages"])
671 new_kwargs.pop("response_model", None)
672 new_kwargs.pop("messages", None)
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/utils.py:1001, in convert_to_genai_messages(messages)
999 content_parts.append(types.Part.from_text(text=content_item))
1000 elif isinstance(content_item, (Image, Audio, PDF)):
-> 1001 content_parts.append(content_item.to_genai())
1002 else:
1003 raise ValueError(
1004 f"Unsupported content item type: {type(content_item)}"
1005 )
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/multimodal.py:601, in PDF.to_genai(self)
595 if self.data:
596 return types.Part.from_bytes(
597 data=base64.b64decode(self.data),
598 mime_type=self.media_type,
599 )
--> 601 raise ValueError("Unsupported PDF format")
ValueError: Unsupported PDF format
Convert URI to URL for use with PDF.from_urlworks (somewhat obviously, as this is like any other HTTP URL) ✅
import instructor
from pydantic import BaseModel
from instructor.multimodal import PDF
from google.genai import Client
class ImageDescription(BaseModel):
items: list[str]
url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf"
client = instructor.from_genai(Client(vertexai=True))
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{
"role": "user",
"content": [
"Describe this PDF in detail",
PDF.from_url(url),
], # type: ignore
}
],
autodetect_images=True,
response_model=ImageDescription,
)
Convert URI image with Image.from_url ❌
import instructor
from pydantic import BaseModel
from instructor.multimodal import Image
from google.genai import Client
class ImageDescription(BaseModel):
items: list[str]
uri = "gs://cloud-samples-data/generative-ai/image/cricket.jpeg"
client = instructor.from_genai(Client(vertexai=True))
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{
"role": "user",
"content": [
"Describe this PDF in detail",
Image.from_url(uri),
], # type: ignore
}
],
autodetect_images=True,
response_model=ImageDescription,
)
Fails with
---------------------------------------------------------------------------
ClientError Traceback (most recent call last)
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/retry.py:168, in retry_sync(func, response_model, args, kwargs, context, max_retries, strict, mode, hooks)
167 hooks.emit_completion_arguments(*args, **kwargs)
--> 168 response = func(*args, **kwargs)
169 hooks.emit_completion_response(response)
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/client_genai.py:66, in from_genai.<locals>.sync_wrapper(*args, **kwargs)
64 return client.models.generate_content_stream(*args, **kwargs) # type:ignore
---> 66 return client.models.generate_content(*args, **kwargs)
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/google/genai/models.py:6058, in Models.generate_content(self, model, contents, config)
6057 i += 1
-> 6058 response = self._generate_content(
6059 model=model, contents=contents, config=parsed_config
6060 )
6061 logger.info(f'AFC remote call {i} is done.')
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/google/genai/models.py:5007, in Models._generate_content(self, model, contents, config)
5005 request_dict = _common.encode_unserializable_types(request_dict)
-> 5007 response_dict = self._api_client.request(
5008 'post', path, request_dict, http_options
5009 )
5011 if self._api_client.vertexai:
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/google/genai/_api_client.py:927, in BaseApiClient.request(self, http_method, path, request_dict, http_options)
924 http_request = self._build_request(
925 http_method, path, request_dict, http_options
926 )
--> 927 response = self._request(http_request, stream=False)
928 json_response = response.json
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/google/genai/_api_client.py:793, in BaseApiClient._request(self, http_request, stream)
786 response = self._httpx_client.request(
787 method=http_request.method,
788 url=http_request.url,
(...)
791 timeout=http_request.timeout,
792 )
--> 793 errors.APIError.raise_for_response(response)
794 return HttpResponse(
795 response.headers, response if stream else [response.text]
796 )
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/google/genai/errors.py:104, in APIError.raise_for_response(cls, response)
103 if 400 <= status_code < 500:
--> 104 raise ClientError(status_code, response_json, response)
105 elif 500 <= status_code < 600:
ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Provided image is not valid.', 'status': 'INVALID_ARGUMENT'}}
The above exception was the direct cause of the following exception:
RetryError Traceback (most recent call last)
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/retry.py:163, in retry_sync(func, response_model, args, kwargs, context, max_retries, strict, mode, hooks)
162 response = None
--> 163 for attempt in max_retries:
164 with attempt:
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/tenacity/__init__.py:443, in BaseRetrying.__iter__(self)
442 while True:
--> 443 do = self.iter(retry_state=retry_state)
444 if isinstance(do, DoAttempt):
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/tenacity/__init__.py:376, in BaseRetrying.iter(self, retry_state)
375 for action in self.iter_state.actions:
--> 376 result = action(retry_state)
377 return result
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/tenacity/__init__.py:419, in BaseRetrying._post_stop_check_actions.<locals>.exc_check(rs)
418 raise retry_exc.reraise()
--> 419 raise retry_exc from fut.exception()
RetryError: RetryError[<Future at 0x16c704ad0 state=finished raised ClientError>]
The above exception was the direct cause of the following exception:
InstructorRetryException Traceback (most recent call last)
Cell In[9], line 12
10 uri = "gs://cloud-samples-data/generative-ai/image/cricket.jpeg"
11 client = instructor.from_genai(Client(vertexai=True))
---> 12 response = client.chat.completions.create(
13 model="gemini-2.5-pro",
14 messages=[
15 {
16 "role": "user",
17 "content": [
18 "Describe this PDF in detail",
19 Image.from_url(uri),
20 ], # type: ignore
21 }
22 ],
23 autodetect_images=True,
24 response_model=ImageDescription,
25 )
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/client.py:366, in Instructor.create(self, response_model, messages, max_retries, validation_context, context, strict, **kwargs)
354 def create(
355 self,
356 response_model: type[T] | None,
(...)
362 **kwargs: Any,
363 ) -> T | Any | Awaitable[T] | Awaitable[Any]:
364 kwargs = self.handle_kwargs(kwargs)
--> 366 return self.create_fn(
367 response_model=response_model,
368 messages=messages,
369 max_retries=max_retries,
370 validation_context=validation_context,
371 context=context,
372 strict=strict,
373 hooks=self.hooks,
374 **kwargs,
375 )
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/patch.py:193, in patch.<locals>.new_create_sync(response_model, validation_context, context, max_retries, strict, hooks, *args, **kwargs)
187 response_model, new_kwargs = handle_response_model(
188 response_model=response_model, mode=mode, **kwargs
189 ) # type: ignore
191 new_kwargs = handle_templating(new_kwargs, mode=mode, context=context)
--> 193 response = retry_sync(
194 func=func, # type: ignore
195 response_model=response_model,
196 context=context,
197 max_retries=max_retries,
198 args=args,
199 hooks=hooks,
200 strict=strict,
201 kwargs=new_kwargs,
202 mode=mode,
203 )
204 return response
File ~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/instructor/retry.py:194, in retry_sync(func, response_model, args, kwargs, context, max_retries, strict, mode, hooks)
192 except RetryError as e:
193 logger.debug(f"Retry error: {e}")
--> 194 raise InstructorRetryException(
195 e.last_attempt._exception,
196 last_completion=response,
197 n_attempts=attempt.retry_state.attempt_number,
198 #! deprecate messages soon
199 messages=extract_messages(
200 kwargs
201 ), # Use the optimized function instead of nested lookups
202 create_kwargs=kwargs,
203 total_usage=total_usage,
204 ) from e
InstructorRetryException: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Provided image is not valid.', 'status': 'INVALID_ARGUMENT'}}
Pass image URI string in directly with autodetect_images=True ✅
import instructor
from pydantic import BaseModel
from instructor.multimodal import Image
from google.genai import Client
class ImageDescription(BaseModel):
items: list[str]
uri = "gs://cloud-samples-data/generative-ai/image/cricket.jpeg"
client = instructor.from_genai(Client(vertexai=True))
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{
"role": "user",
"content": [
"Describe this PDF in detail",
uri,
], # type: ignore
}
],
autodetect_images=True,
response_model=ImageDescription,
)
mind making a pr? lets try to get from_provider("google/
Sorry for the delay @jxnl - what was the approach you're thinking? Happy to try and pick this up