agents
agents copied to clipboard
Interruptions at the beginning of the agent's response
Feature Type
Would make my life easier
Feature Description
Sometime the agent interrupts the user. This is because while the user finished speaking a first time. He then proceed to say another sentence. In that case the first audio frames emitted by the agent interrupt the user. They mostly both start speaking at the same time.
To prevent this, I am thinking we could have the VAD be triggered right before emitting the first audio frame, then wait for STT response.
if the user said something: session.interrupt()
otherwise we continue speaking by playing the delayed frames.
Note: after reading the documentation it seems that the current implementation of interruptions is already working this way with: interruption based on VAD only and then we continue if it turns out nothing has been said.
If this is correct, maybe the feature would be interruption based on how many words the agent said : like if the agent said less than 5 words: we want any word coming from the user to interrupt the agent. In our case, once the agent started speaking (~more than 5 words), we don't want it to be interrupted anymore.
Also, during that time where we are trying to figure out if the interruption detected by the VAD is valid, we would like to play a sound to inform the user we were about to speak such as "mmh"
However it seems there are no event like agent_maybe_interrupted (triggered by VAD only)
Workarounds / Alternatives
No response
Additional Context
No response