Add File API to GeminiMultimodalLive
*This still needs to be tested and would like input and review.
This PR adds support for Google's Gemini File API in the multimodal live service.
Changes:
- Added a new FileAPI class that handles file uploads, listing, and management
- Updated events.py to include FileData model for file references
- Modified GeminiMultimodalLiveContext to support file references in conversations
- Updated __init__.py to expose the new GeminiFileAPI class
The File API allows referencing uploaded files in conversations, which is useful for document analysis, audio processing, and working with large media files.
You can reference files that have already been uploaded to the File API by providing their file_uri within a FileData object, which is placed inside a Part within a Content message sent via BidiGenerateContentClientContent.
The FileData message contains: string mime_type: The MIME type of the file at the URI. string file_uri: The URI pointing to the file.
Use Cases
- Document Analysis: Upload and analyze PDFs, spreadsheets, and text documents
- Audio Processing: Process audio files larger than streaming audio allows
- Image Analysis: Work with high-resolution images or multiple images
- Persistent References: Reference the same file across multiple sessions within the 48-hour window
- Large File Support: Handle files up to 2GB that would be too large for direct inclusion in messages
Technical Notes
- Files are stored on Google's servers for 48 hours
- Maximum file size is 2GB
- Total storage per project is 20GB
- The implementation follows single responsibility principle with a dedicated file API class
- File references work best when added to the context at the beginning of a conversation
https://ai.google.dev/gemini-api/docs/files
Very cool! Can you add an example, so this is easy to test?
Added a Files API example. Let me know if it runs for you.
hi @getchannel I rebased on main and updated the example here: https://github.com/pipecat-ai/pipecat/pull/2107 If it looks good, can you add the changes to your PR?
We may want to move toward using genai.client from the python sdk in the future (WIP PR here) but I think we can get this in as-is first 🎸
Thanks @vipyne! I merged all changes from #2107 into this PR.
Thank you for the improvements to the example file and the cleanup work.
Ready for review when you are!