chatterbox icon indicating copy to clipboard operation
chatterbox copied to clipboard

How to Add Laughter in Text-to-Speech (TTS) Output

Open raibove opened this issue 5 months ago • 2 comments
trafficstars

Thank you for open-sourcing this model! I’m trying to include laughter in the output of a Text-to-Speech (TTS) system. I attempted using variations like "haha", and "(laugh)" within the sentence, but none of them produced a natural or audible laugh in the TTS output. Instead, the system reads them as plain text. I'm looking for guidance on how to properly include laughter or natural-sounding laugh effects in TTS-generated speech.

raibove avatar May 28 '25 17:05 raibove

This specific model was not trained with special tags for laughter, so it will likely always try to pronounce the laughter as words. Some things you can play around with though:

  1. try to prompt your text, like "This is hilarious! I'm laughing so hard! hahaha "
  2. supply a speaker reference of a "bubbly" voice, or maybe even a reference with laughter
  3. set the exaggeration value to something between 0.5 (default) and 1.5 (extreme)

Here is a snippet,

import torchaudio as ta
from chatterbox.tts import ChatterboxTTS
model = ChatterboxTTS.from_pretrained(device="cuda")
text = "This is so funny hahaha I can't stop laughing!"
wav = model.generate(text, exaggeration=0.9, audio_prompt_path="funny_voice.wav")
ta.save("test-2.wav", wav, model.sr)

The default voice sounds quite sarcastic when you do this 😆

I'd be curious to hear any interesting outputs you create!

johnmeade avatar May 28 '25 19:05 johnmeade

I'm also very interested with emotions (and other languages). Here's my result with exaggeration=1.4, cfg=0.5, temp=0.8

(I can't post flac as is, zipped) ComfyUI_temp_rnigc_00009_.flac.zip

As you can hear, it's still "reading the book". Please let us know when it's implemented.

j2l avatar Jun 18 '25 11:06 j2l