pipecat Fixing TavusTransport with some TTS services.

Fixing TavusTransport with some TTS services.

May 26 '25 21:05 filipi87

Codecov Report

:x: Patch coverage is 0% with 46 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/pipecat/services/tavus/video.py	0.00%	26 Missing :warning:
src/pipecat/transports/services/tavus.py	0.00%	20 Missing :warning:

Files with missing lines	Coverage Δ
src/pipecat/transports/services/tavus.py	`0.00% <0.00%> (ø)`
src/pipecat/services/tavus/video.py	`0.00% <0.00%> (ø)`

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

May 26 '25 21:05 codecov[bot]

@filipi87 Can you describe the issue we are fixing?

May 27 '25 17:05 aconchillo

Hi @aconchillo ,

The issue was that we were using the TTSStartedFrame to create the inference ID that we sent to Tavus, which we are calling _current_idx_str in both TavusTransport and TavusVideoService.

So, the problem was that the audio frames, TTSStartedFrame, and TTSStoppedFrames were handled in different queues. Consequently, there were instances where _current_idx_str was updated before all the audio was processed by Tavus. This resulted in only parts of the audio being spoken, typically the beginning of each sentence.

Another issue involved how we calculated the wait time, which sometimes caused the replica to speak the first utterance but then remain muted for an extended period.

Both issues are easily reproducible when using DeepgramTTS or OpenAITTS.

May 27 '25 17:05 filipi87

Hi @aconchillo ,

The issue was that we were using the TTSStartedFrame to create the inference ID that we sent to Tavus, which we are calling _current_idx_str in both TavusTransport and TavusVideoService.

So, the problem was that the audio frames, TTSStartedFrame, and TTSStoppedFrames were handled in different queues. Consequently, there were instances where _current_idx_str was updated before all the audio was processed by Tavus. This resulted in only parts of the audio being spoken, typically the beginning of each sentence.

Another issue involved how we calculated the wait time, which sometimes caused the replica to speak the first utterance but then remain muted for an extended period.

Both issues are easily reproducible when using DeepgramTTS or OpenAITTS.

OK! Thank you!

May 27 '25 19:05 aconchillo