AudioBufferProcessor miscomputing silence?
pipecat version
0.0.90
Python version
3.11.11
Operating System
macOS 15.6.1
Issue description
I was noticing what seemed like misalignment of the user and bot audio clips in the merged output from AudioBufferProcessor, and after looking at the code I'm wondering if it's because the computed silence might be incorrect. It is currently using properties to track the last frame times and then subtracting from current time to determine how much silence to add to the output streams. However, it's also adding the actual audio clip to the stream, so wouldn't that mean that the silence should also take into account the duration of the audio clip? I.e. for the user audio stream:
frame_time = frame.num_frames / frame.sample_rate # This should be added to the last timestamp?
self._last_user_frame_at = time.time() + frame_time
Reproduction steps
Save merged audio output.
Expected behavior
User and bot audio should be properly aligned.
Actual behavior
Audio isn't aligned.
Logs
Can you share a simple repro case demonstrating the issue?
We recently fixed an alignment issue and our testing confirms all is well. It's possible that you're using the AudioBufferProcessor in a way that we haven't tested, so sharing a repro case will help.
@markbackman Perhaps I wasn't using the latest code, but doesn't my calculation still stand? I made a diagram to show what I believe to be the currently calculated silence time versus what I'd expect. Currently, it doesn't take into account the duration of the audio clip, so this would be accurate, right? (The image may be wrong to use the audio start times as the event triggers, rather than the audio stop which I believe is actually correct, but it should still work for explanatory purposes)
@glennpow i am also having this kind of behavior from 4 to 5 days, I am having user's audio delayed in the recording
In actual scenerio I have answered all the bot question immediately once bot is stopped speaking
But In Recordings i am getting user's audio way to delayed
I am attaching my audio recording below please hear this for once
@ParthShindenovus Yes, and actually unfortunately it seems like the solution I propose above doesn't fully fix the issue. @markbackman Can you confirm that when you make a recording now the merged audio is always properly aligned?
@glennpow yes, my audio is fully aligned in all of the scenarios that I've tested. If you have a single file repro that shows misalignment, that would be very helpful. @ParthShindenovus shared one in Discord, but I have no misalignment in running it.
@markbackman I just posted an audio clip to Discord.
@markbackman I just posted an audio clip to Discord.
Thanks for the clip, but what I really need is a simple, single file repro of the issue. Ideally, something that takes the 34-audio-recording.py example and modifies it in a way that makes this issue reproducible.
I'm also facing the same issue when I integrate MCPClient with the agent. and I tried removing MCPClient, It works fine when removed.
I know It's super weird but I took the 34-audio-recording.py which is working fine and I kept adding code from my agent and after each functionality I tested it, it only happens when I add MCPClient tools not sure why
This issue is happening in latest version also
@rohitkhatri thanks for sharing a repro for this! We'll take a look this week.
@ayubSubhaniya are you also using the MCPClient?
We’re currently live in production and rely on the audio-merging functionality of the library for our sentiment module. Because of this silence-computation bug, we’re unable to put our audio sentiment module on top.
Could you let us know if there is a temporary workaround we could apply (for example, manual silence insertion, adjusting timestamps, or using an alternate processing path) until the fix is rolled out? Additionally, do you have an estimated timeframe for when this bug might be resolved in a stable release (or a nightly build)?
Thanks again for your support!
@piyushjain0106 are you using the MCPClient?
@markbackman OP here. I've never used the MCPClient, so I'm fairly certain this has nothing to do with it.
@glennpow can you please share a minimal repro (e.g. code that I can run that hits the issue) so we can investigate and fix? I can confirm that the 34-audio-recording example works as expected. @rohitkhatri confirmed that as well.
Without a repro, it will be difficult to isolate the issue as this is working correctly in example 34. It's possible that other frame processors are interfering. Do you have any custom frame processors in your Pipeline?
@markbackman just a thought, shouldn't recording be raw? After understanding current code, it is post audio filter application, so if any noise or other filter it will be cancelled in recording.
Also level of recording overlap gets worst on extreme load like 20-30 concurrent calls
just a thought, shouldn't recording be raw? After understanding current code, it is post audio filter application, so if any noise or other filter it will be cancelled in recording.
It's a processor in the Pipeline, so it receives the InputAudioRawFrame based on the positioning in the Pipeline. To pick up user and bot audio, aligned with the timing of audio transmitted to the user, you place it after the transport output processor. This means that the user audio is processed, augmented by filters if present and bot audio is raw.
You can get raw audio, but you'd have to get it directly from the transport provider.
Also level of recording overlap gets worst on extreme load like 20-30 concurrent calls
This sounds like a resourcing issue on your infrastructure. We recommend that voice bots run in their own process with 0.5 vCPU for each instance. You may require more depending on what your application does (video, video avatar, etc.). Essentially, you need to ensure that each bot has an equal and sufficient amount of resources allocated.
Ohk thanks a lot @markbackman for reply. On resource part do you recommend 1 process per call? Like something of multipool executor?
Or just to detach voice call with the main IO loop by multiprocessing?
At the moment I ran around 10-20 concurrent calls from same pod using fastapi
I am pretty sure I am also seeing this. I can't get a code sample quite just because of the way I have my stuff set up (I'll work on getting one). I checked twilio's recording and the silence and turns were all correct but the audio buffer processor spit out a bunch of incorrect audio segment timing.
can everyone who has experienced this issue, list the following in a post:
- your pipecat version
- python version
- the transport type you are using
- where the app is hosted / where you are seeing this behavior (ie pipecat cloud, self hosted, local development)
- whether or not you use MCP client
for example:
- 0.0.92
- 3.12.9
- FastAPIWebsocketTransport
- pipecat cloud
- False
@markbackman You can get raw audio, but you'd have to get it directly from the transport provider. I even tried this https://docs.pipecat.ai/server/utilities/audio/audio-buffer-processor#event-handlers but the recording it is generating is coming jumbled to me, order of sentences is coming wrong
@vipyne your pipecat version -> 0.0.87 python version -> 3.11 the transport type you are using -> FastAPIWebsocketTransport where the app is hosted / where you are seeing this behavior (ie pipecat cloud, self hosted, local development) -> self hosted whether or not you use MCP client -> no
For me: 0.0.91 3.12.9 FastAPIWebsocketTransport self hosted False
I actually found the issue in my case. I had a couple sneaky time.sleep(X) in my pipeline start up that was causing issues with the AuidoBufferProcessor.
can everyone who has experienced this issue, list the following in a post:
- your pipecat version
- python version
- the transport type you are using
- where the app is hosted / where you are seeing this behavior (ie pipecat cloud, self hosted, local development)
- whether or not you use MCP client
for example:
- 0.0.92
- 3.12.9
- FastAPIWebsocketTransport
- pipecat cloud
- False
0.0.92 3.13.2 FastAPIWebsocketTransport Self hosted True
- 0.0.92
- 3.11.11
- FastAPIWebsocketTransport
- Self hosted
- False
0.0.96 3.13. FastAPIWebsocketTransport Self hosted False
- Damn, so no one figured it out after 2 months? I even used the 34 code example, and it still had same issues.
- 0.0.93
- 3.12
- FastAPIWebsocketTransport
- Self hosted
- False