When the user interrupts the bot while it is speaking, you hear a very short burst of audio (a few letters/phonemes) before the TTS actually stops.
pipecat version
3.11
Python version
0.93
Operating System
Ubuntu 20.04.6 LTS
Use Case Description
When the user interrupts the bot while it is speaking, you hear a very short burst of audio (a few letters/phonemes) before the TTS actually stops.
This sounds like:
half a syllable
a tiny click / micro-word
the beginning of the next token from TTS
stale audio queued in transport
Current Approach
current pipeline flow is
pipeline = Pipeline(
[
ws_transport.input(), # Transport user input
rtvi,
stt, # STT,
idle_intent_rag,
context_aggregator.user(), # User responses
llm, # LLM
custom_frame_processor,
tts, # TTS
#resampler,
ws_transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
Errors or Unexpected Behavior
most of the time, we are hearing the shor audio clipping when interruption/user barge in happens
Additional Context
No response
Which TTS service are you using?
Hi, we are using open AI Azure for STT ,LLM and TTS services
Sorry, which service class in particular for TTS?
Hi Mark, here is the constructor calling
from pipecat.services.azure.tts import AzureTTSService from pipecat.utils.text.markdown_text_filter import MarkdownTextFilter
md_filter = MarkdownTextFilter( params=MarkdownTextFilter.InputParams( filter_code=True, filter_tables=True ))
tts = AzureTTSService( api_key=azure_platform['tts_key'], region=os.getenv("AZURE_SPEECH_REGION"), voice=channel_config['tts']['voice'], format=audio_profile['format'], text_filter=md_filter, )
Here is the pipeline task :
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_out_encoding=audio_profile["out_encoding"],
audio_out_sample_rate_hz=audio_profile["out_sample_rate"],
audio_out_channels=1,
enable_metrics=True,
enable_usage_metrics=True,
allow_interruptions=True,
interruption_strategies=[MinWordsInterruptionStrategy(min_words=3)]
),
observers=[RTVIObserver(rtvi)],
)
Seems similar to what we are facing, but we are not using Azure TTS, so it looks like it's not a TTS issue. Which telephony service are you using @bobysp
I've tested the 07f foundational example, which uses the AzureTTSService, and I'm not seeing the issue. @bobysp can you say more about your configuration?
Also, I see you have an arg called format, which is not an arg for AzureTTSService. What are you trying to set? If it's the sample rate, it should be set in the PipelineParams via `audio_out_sample_rate.
Additionally, I see a number of PipelineParams which don't exist. Perhaps this was an LLM hallucinating in writing the code?