MiniCPM-V icon indicating copy to clipboard operation
MiniCPM-V copied to clipboard

Live Streaming - Avoid regurgitating previously spoken details

Open matbeedotcom opened this issue 11 months ago • 4 comments

Any strategies that youre aware of to reduce the amount of repetition in the responses? I'm doing a streaming audio+video prompt with streaming output. It tends to just say the same thing even if adds nothing of value. Is there a way to keep the responses to "changes" between last inference?

matbeedotcom avatar Jan 26 '25 18:01 matbeedotcom

Could you provide an example to illustrate what specific type of repetition you are referring to?

YuzaChongyi avatar Jan 27 '25 04:01 YuzaChongyi

@matbee-eth what code / framework are you using?

franz101 avatar Jan 28 '25 20:01 franz101

I'm using Transformers with my own framework

matbeedotcom avatar Jan 29 '25 02:01 matbeedotcom

Could you provide an example to illustrate what specific type of repetition you are referring to?

For instance given the following code block- if I call model.streaming_prefill with a new audio/image/etc, after streaming_generate completes, it can just end up repeating itself.

Do I need to include 'assistant' role history in the msgs [...] array?

I assume I dont need to considering there is a "session_id" system.

# 1. prefill system prompt
res = model.streaming_prefill(
    session_id=session_id,
    msgs=[sys_msg], 
    tokenizer=tokenizer
)

# 2. prefill video/audio chunks
for content in contents:
    msgs = [{"role":"user", "content": content}]
    res = model.streaming_prefill(
        session_id=session_id,
        msgs=msgs, 
        tokenizer=tokenizer
    )

# 3. generate
res = model.streaming_generate(
    session_id=session_id,
    tokenizer=tokenizer,
    temperature=0.5,
    generate_audio=generate_audio
)

# 4. prefill video/audio chunks
for content in contents:
    msgs = [{"role":"user", "content": content}]
    res = model.streaming_prefill(
        session_id=session_id,
        msgs=msgs, 
        tokenizer=tokenizer
    )

# 5. generate
res = model.streaming_generate(
    session_id=session_id,
    tokenizer=tokenizer,
    temperature=0.5,
    generate_audio=generate_audio
)

res <--- may end up repeating itself.

matbeedotcom avatar Feb 11 '25 23:02 matbeedotcom