pipecat icon indicating copy to clipboard operation
pipecat copied to clipboard

Cartesia SSML tags with decimal attributes (e.g., <speed ratio="1.05"/>) get split by TTS text aggregator; controls dropped

Open ayubSubhaniya opened this issue 2 months ago • 1 comments

pipecat version

0.0.92

Python version

3.11

Operating System

macOS 12.5

Issue description

When sending SSML to Cartesia (Sonic-3) with decimal attributes (e.g., or ), Pipecat’s TTS text aggregation splits on the dot and the tag is truncated/ignored by the time it reaches the provider. Emotion (first token) survives; subsequent tags with dotted numbers do not apply reliably.

Reproduction steps

Assistant message sent to TTS: Hello, thanks for waiting. I’m really sorry to hear that.

Expected behavior

Cartesia receives text where and/or is missing or cut; only emotion applies. Using integers (ratio="1") sometimes passes, decimals (ratio="1.05", "0.9") often break.

Actual behavior

Cartesia should receive SSML intact and apply speed/volume as per docs.

Logs

## Root Cause Analysis

**Code pointers (likely cause: sentence aggregation + EOS detection on "."):**

1. **TTS Service processes text through aggregator:**
   
   # pipecat/src/pipecat/services/tts_service.py:454-462
   async def _process_text_frame(self, frame: TextFrame):
       text: Optional[str] = None
       if not self._aggregate_sentences:
           text = frame.text
       else:
           text = await self._text_aggregator.aggregate(frame.text)
   

2. **SkipTagsAggregator uses NLTK sentence tokenizer which splits on dots:**
   
   # pipecat/src/pipecat/utils/text/skip_tags_aggregator.py:69-86
   if not self._current_tag:
       eos_marker = match_endofsentence(self._text)
       if eos_marker:
           result = self._text[:eos_marker]
           self._text = self._text[eos_marker:]
           return result
   

3. **match_endofsentence uses NLTK sent_tokenize which treats dots as sentence boundaries:**
   
   # pipecat/src/pipecat/utils/string.py:131-151
   sentences = sent_tokenize(text)
   if len(sentences) > 1:
       return len(first_sentence)
   

4. **Cartesia integration defaults to only skipping `<spell>` pairs, so self-closing tags aren't protected:**
   
   # pipecat/src/pipecat/services/cartesia/tts.py:186-187
   text_aggregator=text_aggregator or SkipTagsAggregator([("<spell>", "</spell>")]),
   

**The issue:** When text like `<speed ratio="1.05"/>` is processed, NLTK's sentence tokenizer sees the dot in "1.05" as a sentence boundary and splits the text, truncating the SSML tag before it reaches Cartesia.

ayubSubhaniya avatar Nov 03 '25 19:11 ayubSubhaniya

Interesting... I am currently using pipecat-ai==0.0.90 and this has not been an issue for me; however I did have to modify my text accumulator, as it by default removes SSML tags for transcription purposes; but not before it is sent to the SSML provider (Cartesia in my case). I am wondering if there is a config issue in the code on your side?

noahvandal avatar Nov 12 '25 23:11 noahvandal

We've recently added some very powerful text processing capabilities to Pipecat. We've recently updated the docs about what's possible (https://docs.pipecat.ai/guides/learn/text-to-speech#text-processing-and-filtering).

Here's a a very targeted response is that you can now easily handle speed tags for Cartesia. Check out the docs: https://docs.pipecat.ai/server/services/tts/cartesia#speed-tag-speed:-float-%3E-str:

To use this, you need to update to pipecat-ai 0.0.96.

markbackman avatar Dec 02 '25 00:12 markbackman