agents icon indicating copy to clipboard operation
agents copied to clipboard

Interruptions at the beginning of the agent's response

Open CyprienRicqueB2L opened this issue 3 weeks ago • 3 comments

Feature Type

Would make my life easier

Feature Description

Sometime the agent interrupts the user. This is because while the user finished speaking a first time. He then proceed to say another sentence. In that case the first audio frames emitted by the agent interrupt the user. They mostly both start speaking at the same time.

To prevent this, I am thinking we could have the VAD be triggered right before emitting the first audio frame, then wait for STT response. if the user said something: session.interrupt() otherwise we continue speaking by playing the delayed frames.

Note: after reading the documentation it seems that the current implementation of interruptions is already working this way with: interruption based on VAD only and then we continue if it turns out nothing has been said.

If this is correct, maybe the feature would be interruption based on how many words the agent said : like if the agent said less than 5 words: we want any word coming from the user to interrupt the agent. In our case, once the agent started speaking (~more than 5 words), we don't want it to be interrupted anymore.

Also, during that time where we are trying to figure out if the interruption detected by the VAD is valid, we would like to play a sound to inform the user we were about to speak such as "mmh" However it seems there are no event like agent_maybe_interrupted (triggered by VAD only)

Workarounds / Alternatives

No response

Additional Context

No response

CyprienRicqueB2L avatar Nov 12 '25 14:11 CyprienRicqueB2L