Performance Issue: Initial Greeting Latency in Telephony Agent Pipeline
pipecat version
0.0.85
Python version
3.13.2
Operating System
Arch Linux
Question
- Initial Response Optimization: Are there known patterns for optimizing the very first LLM→TTS→Audio delivery cycle in telephony applications?
- Frame Queue Initialization: What could cause delay specifically for the first LLMRunFrame? Is there initialization overhead we can preload?
- TTS Cold Start: Are there streaming/chunking optimizations for initial responses with ElevenLabs?
What I've tried
- Service warm-up (pre-initializing OpenAI LLM, ElevenLabs TTS, OpenAI STT services)
- Switched to gpt-4o-mini for faster LLM responses
- Optimized ElevenLabs parameters (model, stability, speed settings)
Context
- Application: Real-time telephony agent
- Stack: FastAPI WebSocket + Twilio + OpenAI LLM + ElevenLabs TTS
- Audio: G.711 μ-law (8kHz sample rate)
Experiencing significant latency (4-5 seconds) specifically for the initial greeting in telephony pipeline. Mid-conversation latency is acceptable (1-2s), but the first response has a substantial delay that impacts user experience when the call is answered.
Pipeline Configuration:
pipeline = Pipeline([
transport.input(), # Twilio WebSocket input
stt, # OpenAI STT (gpt-4o-transcribe)
transcript.user(),
context_aggregator.user(),
llm, # OpenAI LLM (gpt-4o-mini)
tts, # ElevenLabs TTS (eleven_flash_v2_5)
transport.output(), # Twilio WebSocket output
transcript.assistant(),
context_aggregator.assistant(),
])
Through detailed logging with custom frame processors, we identified three major gaps during first response processing: frame queue delay (347ms), TTS processing (2.5s), and transport to transcript delay (1.8s).
Honestly same problemo man
Yes, same problem
A few questions:
- Are you running locally or deployed?
- Is this a dial-in or dial-out use case? (I'm assuming that you're dialing in to the bot based on the question.)
I just ran this example deployed to Pipecat Cloud and I get a response time in ~1 second after picking up: https://github.com/pipecat-ai/pipecat-examples/tree/main/twilio-chatbot/inbound
My example on Pipecat Cloud runs with min_agents: 1 which ensures that I have a single warm reserve agent available to response immediately when I dial-in. We strongly recommend running with a warm reserve to avoid pod / process start up times.
Hi Mark! Thanks for getting to us. I am running the agent on localhost using the ngrok webhook, and yes, it is inbound. Would deploying on the cloud speed up the response time? In my pipeline, I am using: model="gpt-4o-transcribe", model="gpt-4o-mini", and model="tts-1". with FastAPIWebsocketTransport. My initialization is 5 seconds, and mid convo is like 6 to 10 seconds too. If it's okay, can I put the Python file somewhere so you can look at it?
# bot.py
#
# Copyright (c) 2025
# SPDX-License-Identifier: BSD 2-Clause License
import datetime
import io
import os
import sys
import wave
from typing import Optional
import aiofiles
from dotenv import load_dotenv
from fastapi import WebSocket
from loguru import logger
from pipecat.observers.loggers.user_bot_latency_log_observer import UserBotLatencyLogObserver
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor
from pipecat.serializers.twilio import TwilioFrameSerializer
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.openai.stt import OpenAISTTService
from pipecat.services.openai.tts import OpenAITTSService
from pipecat.transports.websocket.fastapi import (
FastAPIWebsocketParams,
FastAPIWebsocketTransport,
)
from pipecat.frames.frames import LLMRunFrame
load_dotenv(override=True)
logger.remove()
logger.add(sys.stderr, level="DEBUG")
async def save_audio(server_name: str, audio: bytes, sample_rate: int, num_channels: int):
if len(audio) == 0:
logger.info("No audio data to save")
return
filename = f"{server_name}_recording_{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.wav"
with io.BytesIO() as buffer:
with wave.open(buffer, "wb") as wf:
wf.setsampwidth(2)
wf.setnchannels(num_channels)
wf.setframerate(sample_rate)
wf.writeframes(audio)
async with aiofiles.open(filename, "wb") as file:
await file.write(buffer.getvalue())
logger.info(f"Merged audio saved to {filename}")
async def run_bot(
websocket_client: WebSocket,
stream_sid: Optional[str],
call_sid: Optional[str],
account_sid: Optional[str],
testing: bool,
):
"""
Build and run the Pipecat pipeline for a single WebSocket call.
"""
# Bi-directional WebSocket transport + Twilio serializer
transport = FastAPIWebsocketTransport(
websocket=websocket_client,
params=FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
add_wav_header=False,
vad_analyzer=SileroVADAnalyzer(),
serializer=TwilioFrameSerializer(
stream_sid=stream_sid,
call_sid=call_sid,
account_sid=account_sid,
auth_token=os.getenv("TWILIO_AUTH_TOKEN"),
),
),
)
# Explicit handles
input_proc = transport.input()
output_proc = transport.output()
# Services
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-mini",
generation_params={
"max_response_tokens": 60,
"temperature": 0.6,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
},
)
stt = OpenAISTTService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-transcribe",
audio_passthrough=False,
enable_interim_results=True,
endpointing_silence_ms=200,
)
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
model="tts-1",
voice="nova",
)
messages = [
{
"role": "system",
"content": (
"You are a helpful assistant named Tasha. "
"Your output will be converted to audio so don't include special characters in your answers. "
"Respond with a short short sentence."
),
}
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
# Record AFTER output so recording never delays playback
audiobuffer = AudioBufferProcessor()
# Build pipeline
pipeline = Pipeline(
[
input_proc, # Websocket input from client
stt, # Speech-To-Text
context_aggregator.user(), # push user messages into context
llm, # LLM
tts, # Text-To-Speech
output_proc, # Websocket output to client
audiobuffer, # record after output
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
observers=[UserBotLatencyLogObserver()],
params=PipelineParams(
audio_in_sample_rate=8000,
audio_out_sample_rate=24000,
allow_interruptions=True,
),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(_transport, _client):
logger.info("🔌 WebSocket connection established")
await audiobuffer.start_recording()
# Seed a one-line intro into context
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
# IMPORTANT: trigger the LLM with a run frame
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(_transport, _client):
logger.info("🔌 WebSocket connection closed by client")
await task.cancel()
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(_buffer, audio, sample_rate, num_channels):
server_name = f"server_{websocket_client.client.port}"
await save_audio(server_name, audio, sample_rate, num_channels)
runner = PipelineRunner(handle_sigint=False, force_gc=True)
await runner.run(task)