Live Streaming - Avoid regurgitating previously spoken details
Any strategies that youre aware of to reduce the amount of repetition in the responses? I'm doing a streaming audio+video prompt with streaming output. It tends to just say the same thing even if adds nothing of value. Is there a way to keep the responses to "changes" between last inference?
Could you provide an example to illustrate what specific type of repetition you are referring to?
@matbee-eth what code / framework are you using?
I'm using Transformers with my own framework
Could you provide an example to illustrate what specific type of repetition you are referring to?
For instance given the following code block- if I call model.streaming_prefill with a new audio/image/etc, after streaming_generate completes, it can just end up repeating itself.
Do I need to include 'assistant' role history in the msgs [...] array?
I assume I dont need to considering there is a "session_id" system.
# 1. prefill system prompt
res = model.streaming_prefill(
session_id=session_id,
msgs=[sys_msg],
tokenizer=tokenizer
)
# 2. prefill video/audio chunks
for content in contents:
msgs = [{"role":"user", "content": content}]
res = model.streaming_prefill(
session_id=session_id,
msgs=msgs,
tokenizer=tokenizer
)
# 3. generate
res = model.streaming_generate(
session_id=session_id,
tokenizer=tokenizer,
temperature=0.5,
generate_audio=generate_audio
)
# 4. prefill video/audio chunks
for content in contents:
msgs = [{"role":"user", "content": content}]
res = model.streaming_prefill(
session_id=session_id,
msgs=msgs,
tokenizer=tokenizer
)
# 5. generate
res = model.streaming_generate(
session_id=session_id,
tokenizer=tokenizer,
temperature=0.5,
generate_audio=generate_audio
)
res <--- may end up repeating itself.