pipecat icon indicating copy to clipboard operation
pipecat copied to clipboard

When the user interrupts the bot while it is speaking, you hear a very short burst of audio (a few letters/phonemes) before the TTS actually stops.

Open bobysp opened this issue 1 month ago • 7 comments

pipecat version

3.11

Python version

0.93

Operating System

Ubuntu 20.04.6 LTS

Use Case Description

When the user interrupts the bot while it is speaking, you hear a very short burst of audio (a few letters/phonemes) before the TTS actually stops.

This sounds like:

half a syllable

a tiny click / micro-word

the beginning of the next token from TTS

stale audio queued in transport

Current Approach

current pipeline  flow is 
 pipeline = Pipeline(
            [
                ws_transport.input(),  # Transport user input
                rtvi,
                stt,  # STT,
                idle_intent_rag,
                context_aggregator.user(),  # User responses
                llm,  # LLM
                custom_frame_processor,
                tts,  # TTS
                #resampler,
                ws_transport.output(),  # Transport bot output
                context_aggregator.assistant(),  # Assistant spoken responses
            ]

Errors or Unexpected Behavior

most of the time, we are hearing the shor audio clipping when interruption/user barge in happens

Additional Context

No response

bobysp avatar Nov 18 '25 16:11 bobysp

Which TTS service are you using?

markbackman avatar Nov 18 '25 16:11 markbackman

Hi, we are using open AI Azure for STT ,LLM and TTS services

bobysp avatar Nov 19 '25 00:11 bobysp

Sorry, which service class in particular for TTS?

markbackman avatar Nov 19 '25 01:11 markbackman

Hi Mark, here is the constructor calling

from pipecat.services.azure.tts import AzureTTSService from pipecat.utils.text.markdown_text_filter import MarkdownTextFilter

md_filter = MarkdownTextFilter( params=MarkdownTextFilter.InputParams( filter_code=True, filter_tables=True ))

tts = AzureTTSService( api_key=azure_platform['tts_key'], region=os.getenv("AZURE_SPEECH_REGION"), voice=channel_config['tts']['voice'], format=audio_profile['format'], text_filter=md_filter, )

bobysp avatar Nov 19 '25 03:11 bobysp

Here is the pipeline task :

task = PipelineTask( pipeline, params=PipelineParams( audio_out_encoding=audio_profile["out_encoding"], audio_out_sample_rate_hz=audio_profile["out_sample_rate"], audio_out_channels=1, enable_metrics=True, enable_usage_metrics=True, allow_interruptions=True, interruption_strategies=[MinWordsInterruptionStrategy(min_words=3)] ),
observers=[RTVIObserver(rtvi)], )

bobysp avatar Nov 19 '25 04:11 bobysp

Seems similar to what we are facing, but we are not using Azure TTS, so it looks like it's not a TTS issue. Which telephony service are you using @bobysp

omensky avatar Nov 22 '25 10:11 omensky

I've tested the 07f foundational example, which uses the AzureTTSService, and I'm not seeing the issue. @bobysp can you say more about your configuration?

Also, I see you have an arg called format, which is not an arg for AzureTTSService. What are you trying to set? If it's the sample rate, it should be set in the PipelineParams via `audio_out_sample_rate.

Additionally, I see a number of PipelineParams which don't exist. Perhaps this was an LLM hallucinating in writing the code?

markbackman avatar Nov 23 '25 15:11 markbackman