pipecat icon indicating copy to clipboard operation
pipecat copied to clipboard

Fixing TavusTransport with some TTS services.

Open filipi87 opened this issue 7 months ago • 3 comments

Fixing TavusTransport with some TTS services.

filipi87 avatar May 26 '25 21:05 filipi87

Codecov Report

:x: Patch coverage is 0% with 46 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pipecat/services/tavus/video.py 0.00% 26 Missing :warning:
src/pipecat/transports/services/tavus.py 0.00% 20 Missing :warning:
Files with missing lines Coverage Δ
src/pipecat/transports/services/tavus.py 0.00% <0.00%> (ø)
src/pipecat/services/tavus/video.py 0.00% <0.00%> (ø)
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 26 '25 21:05 codecov[bot]

@filipi87 Can you describe the issue we are fixing?

aconchillo avatar May 27 '25 17:05 aconchillo

Hi @aconchillo ,

The issue was that we were using the TTSStartedFrame to create the inference ID that we sent to Tavus, which we are calling _current_idx_str in both TavusTransport and TavusVideoService.

So, the problem was that the audio frames, TTSStartedFrame, and TTSStoppedFrames were handled in different queues. Consequently, there were instances where _current_idx_str was updated before all the audio was processed by Tavus. This resulted in only parts of the audio being spoken, typically the beginning of each sentence.

Another issue involved how we calculated the wait time, which sometimes caused the replica to speak the first utterance but then remain muted for an extended period.

Both issues are easily reproducible when using DeepgramTTS or OpenAITTS.

filipi87 avatar May 27 '25 17:05 filipi87

Hi @aconchillo ,

The issue was that we were using the TTSStartedFrame to create the inference ID that we sent to Tavus, which we are calling _current_idx_str in both TavusTransport and TavusVideoService.

So, the problem was that the audio frames, TTSStartedFrame, and TTSStoppedFrames were handled in different queues. Consequently, there were instances where _current_idx_str was updated before all the audio was processed by Tavus. This resulted in only parts of the audio being spoken, typically the beginning of each sentence.

Another issue involved how we calculated the wait time, which sometimes caused the replica to speak the first utterance but then remain muted for an extended period.

Both issues are easily reproducible when using DeepgramTTS or OpenAITTS.

OK! Thank you!

aconchillo avatar May 27 '25 19:05 aconchillo