agents icon indicating copy to clipboard operation
agents copied to clipboard

text modality not working for gemini multimodal

Open kailashprem opened this issue 7 months ago • 1 comments

when i add text modality in addition to audio, and trying to generate a reply there is error
CLOSE 1007 (invalid frame payload data) Request contains an invalid argument. [39 bytes] works fine with just audio. the google-genai version seems to be old is that the cause of the problem?

kailashprem avatar Apr 09 '25 06:04 kailashprem

I don't think it supports both audio and text at the same time right now.

See: https://github.com/google-gemini/cookbook/issues/386 and https://github.com/google-gemini/cookbook/issues/379

ChenghaoMou avatar Apr 29 '25 15:04 ChenghaoMou

Is this working? I'm getting struggled to get both and save transcription when the interview finish. I got the text on playground but the user input keep overwritting the first message.

MatheusRDG avatar Jun 09 '25 18:06 MatheusRDG

Text modality is supported in https://github.com/livekit/agents/pull/2628

longcw avatar Jun 19 '25 03:06 longcw

closing since 1.2 is released with this functionality. we now support half duplex mode! (audio -> realtime API -> text response -> TTS)

davidzhao avatar Jul 18 '25 07:07 davidzhao