Feature Request: Expose More Native Gemini Live API Parameters in `GeminiMultimodalLiveLLMService`
Requesting more access to the GeminiMultimodalLive api model for specific use cases. For example:
- Duplicate Responses with Tools: Without access to
tool_config.function_calling_config.mode, it's difficult to prevent the model from sometimes generating two responses when a tool (like Google Search) is used – one initial response from training data, and a second grounded response after the tool runs. Settingmode: ANYin the native API can address this. - Limited Control Over VAD: The native API now offers detailed VAD configuration via
realtimeInputConfig, which isn't fully exposed. - Session Management: Fine-grained control over
sessionResumptionand handlingGoAwaymessages directly available. - Context Management: Explicit configuration of
contextWindowCompressionfor longer sessions isn't exposed. - Media Configuration: Direct control over
mediaResolution.
We request that the GeminiMultimodalLiveLLMService be updated to expose more of the configuration parameters available in the native Google Gemini Live API. Key parameters that would be valuable include:
tool_config: Specificallyfunction_calling_configwith itsmodeparameter (AUTO,ANY,NONE).realtimeInputConfig: To allow finer control over Voice Activity Detection settings (e.g., disabling automatic VAD).sessionResumption: Exposing theSessionResumptionConfigoptions.contextWindowCompression: ExposingContextWindowCompressionConfigoptions (likesliding_window).mediaResolution: Allowing explicit setting of media resolution (e.g.,MEDIA_RESOLUTION_LOW,MEDIUM,HIGH).- Access to Server Events: Potentially providing ways to hook into or be notified of server events like
GoAwayorGenerationComplete.
These could potentially be added to the existing InputParams or a new dedicated configuration object passed to the GeminiMultimodalLiveLLMService constructor.
Currently, the primary alternative is to modify the system_instruction prompt to guide the model's behavior (e.g., asking it to wait for tool results). However, this is less direct and potentially less reliable than controlling the underlying API parameters.
Exposing these native parameters would significantly enhance the flexibility and power of the GeminiMultimodalLiveLLMService for Pipecat users. It would allow for more robust handling of tool calls, better management of long-running sessions, optimized media handling, and closer alignment with the full capabilities of the Gemini Live API. The relevant service implementation appears to be in src/pipecat/services/gemini_multimodal_live/gemini.py.
Thank you for considering this request to enhance the integration with the Gemini Live API.
We're working on this. We've added a few items already:
- RealtimeInputConfig
- ContextWindowCompression
- MediaResolution
We'll add more over time.
Can you also add Language params ? Flash 2.0 001 supports various language codes that introduce subtle accents which are great for deploying regional voice agents
Can you also add Language params ? Flash 2.0 001 supports various language codes that introduce subtle accents which are great for deploying regional voice agents
language is already a supported InputParam:
https://docs.pipecat.ai/server/services/s2s/gemini#param-language
Supported codes: https://docs.pipecat.ai/server/services/s2s/gemini#language-support
Hi +1 to requiring support for session resumption