agents
agents copied to clipboard
text modality not working for gemini multimodal
when i add text modality in addition to audio, and trying to generate a reply there is error
CLOSE 1007 (invalid frame payload data) Request contains an invalid argument. [39 bytes]
works fine with just audio. the google-genai version seems to be old is that the cause of the problem?
I don't think it supports both audio and text at the same time right now.
See: https://github.com/google-gemini/cookbook/issues/386 and https://github.com/google-gemini/cookbook/issues/379
Is this working? I'm getting struggled to get both and save transcription when the interview finish. I got the text on playground but the user input keep overwritting the first message.
Text modality is supported in https://github.com/livekit/agents/pull/2628
closing since 1.2 is released with this functionality. we now support half duplex mode! (audio -> realtime API -> text response -> TTS)