pipecat icon indicating copy to clipboard operation
pipecat copied to clipboard

Add File API to GeminiMultimodalLive

Open getchannel opened this issue 8 months ago • 2 comments

*This still needs to be tested and would like input and review.

This PR adds support for Google's Gemini File API in the multimodal live service.

 Changes:
 - Added a new FileAPI class that handles file uploads, listing, and management
 - Updated events.py to include FileData model for file references
 - Modified GeminiMultimodalLiveContext to support file references in conversations
 - Updated __init__.py to expose the new GeminiFileAPI class
 

The File API allows referencing uploaded files in conversations, which is useful for document analysis, audio processing, and working with large media files.

You can reference files that have already been uploaded to the File API by providing their file_uri within a FileData object, which is placed inside a Part within a Content message sent via BidiGenerateContentClientContent.

The FileData message contains: string mime_type: The MIME type of the file at the URI. string file_uri: The URI pointing to the file.

Use Cases

  • Document Analysis: Upload and analyze PDFs, spreadsheets, and text documents
  • Audio Processing: Process audio files larger than streaming audio allows
  • Image Analysis: Work with high-resolution images or multiple images
  • Persistent References: Reference the same file across multiple sessions within the 48-hour window
  • Large File Support: Handle files up to 2GB that would be too large for direct inclusion in messages

Technical Notes

  • Files are stored on Google's servers for 48 hours
  • Maximum file size is 2GB
  • Total storage per project is 20GB
  • The implementation follows single responsibility principle with a dedicated file API class
  • File references work best when added to the context at the beginning of a conversation

getchannel avatar May 09 '25 15:05 getchannel

https://ai.google.dev/gemini-api/docs/files

getchannel avatar May 09 '25 17:05 getchannel

Very cool! Can you add an example, so this is easy to test?

markbackman avatar May 13 '25 23:05 markbackman

Added a Files API example. Let me know if it runs for you.

getchannel avatar May 30 '25 17:05 getchannel

hi @getchannel I rebased on main and updated the example here: https://github.com/pipecat-ai/pipecat/pull/2107 If it looks good, can you add the changes to your PR?

We may want to move toward using genai.client from the python sdk in the future (WIP PR here) but I think we can get this in as-is first 🎸

vipyne avatar Jul 01 '25 22:07 vipyne

Thanks @vipyne! I merged all changes from #2107 into this PR.

Thank you for the improvements to the example file and the cleanup work.

Ready for review when you are!

getchannel avatar Jul 02 '25 00:07 getchannel