cookbook icon indicating copy to clipboard operation
cookbook copied to clipboard

Get_started_LiveAPI.py example stops understanding images

Open rezacopol opened this issue 8 months ago • 5 comments

Description of the bug:

I was using Get_started_LiveAPI.py to play with live API but this week I suddenly noticed it doesn't understand images anymore, I tried to make sure images properly captured from my webcam so I added writing disk before putting in the queue to be send to model but still it doesn't work and model says it doesn't see.

do you see my camera? message > As a large language model, I don't have a physical body or the ability to interact with the physical world. Therefore, I cannot see your camera. I exist only as computer code.

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

rezacopol avatar Apr 14 '25 18:04 rezacopol

ok found one important piece of information, the code works when I use CONFIG = {"response_modalities": ["AUDIO"]} but it stops working (model say I don't see anything) after changing the response_modalities to TEXT, why is that?

rezacopol avatar Apr 15 '25 19:04 rezacopol

I don't understand anything so please tell me what is going on

On Tue, Apr 15, 2025, 3:14 p.m. rezacopol @.***> wrote:

ok found one important piece of information, the code works when I use CONFIG = {"response_modalities": ["AUDIO"]} but it stops working (model say I don't see anything) after changing the response_modalities to TEXT, why is that?

— Reply to this email directly, view it on GitHub https://github.com/google-gemini/cookbook/issues/714#issuecomment-2807229860, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJ3VMV4QI63SOUYW7RUDVJT2ZVLAFAVCNFSM6AAAAAB3DVPM5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMBXGIZDSOBWGA . You are receiving this because you are subscribed to this thread.Message ID: @.***> rezacopol left a comment (google-gemini/cookbook#714) https://github.com/google-gemini/cookbook/issues/714#issuecomment-2807229860

ok found one important piece of information, the code works when I use CONFIG = {"response_modalities": ["AUDIO"]} but it stops working (model say I don't see anything) after changing the response_modalities to TEXT, why is that?

— Reply to this email directly, view it on GitHub https://github.com/google-gemini/cookbook/issues/714#issuecomment-2807229860, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJ3VMV4QI63SOUYW7RUDVJT2ZVLAFAVCNFSM6AAAAAB3DVPM5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMBXGIZDSOBWGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Asjleyam69 avatar Apr 15 '25 19:04 Asjleyam69

to replicate:

in Get_started_LiveAPI.py Change the CONFIG = {"response_modalities": ["AUDIO"]} to CONFIG = {"response_modalities": ["TEXT"]} run the code, and ask what do you see or describe the scene, it often comes back as As a large language model, I don't have a physical body or the ability to interact with the physical world. Therefore, I cannot see your camera. I exist only as computer code.

rezacopol avatar Apr 15 '25 20:04 rezacopol

Also experiencing this, any updates? This is blocking for the use case we're interested in (image + audio input, text output)

kdzapp-botco avatar May 14 '25 20:05 kdzapp-botco

I think it relates to rate limiting since it is working sometimes and not other times.

rezacopol avatar May 14 '25 23:05 rezacopol

hey @rezacopol , By default the model only "sees" video frames paired with speech / audio. To send all video frames with text, you can set turn_coverage to TURN_INCLUDES_ALL_INPUT and only send video frames from client when text is sent.

Note: this will fill up context much faster and increase costs but should allow the model to "see" video frames with text.

Gunand3043 avatar May 26 '25 16:05 Gunand3043

Marking this issue as stale since it has been open for 14 days with no activity. This issue will be closed if no further activity occurs.

github-actions[bot] avatar Jun 09 '25 22:06 github-actions[bot]

This issue was closed because it has been inactive for 27 days. Please post a new issue if you need further assistance. Thanks!

github-actions[bot] avatar Jun 23 '25 22:06 github-actions[bot]

Same issue

Sofianel5 avatar Aug 05 '25 21:08 Sofianel5