chatterbox
chatterbox copied to clipboard
How to Add Laughter in Text-to-Speech (TTS) Output
Thank you for open-sourcing this model! I’m trying to include laughter in the output of a Text-to-Speech (TTS) system. I attempted using variations like "haha", and "(laugh)" within the sentence, but none of them produced a natural or audible laugh in the TTS output. Instead, the system reads them as plain text. I'm looking for guidance on how to properly include laughter or natural-sounding laugh effects in TTS-generated speech.
This specific model was not trained with special tags for laughter, so it will likely always try to pronounce the laughter as words. Some things you can play around with though:
- try to prompt your text, like "This is hilarious! I'm laughing so hard!
hahaha " - supply a speaker reference of a "bubbly" voice, or maybe even a reference with laughter
- set the
exaggerationvalue to something between 0.5 (default) and 1.5 (extreme)
Here is a snippet,
import torchaudio as ta
from chatterbox.tts import ChatterboxTTS
model = ChatterboxTTS.from_pretrained(device="cuda")
text = "This is so funny hahaha I can't stop laughing!"
wav = model.generate(text, exaggeration=0.9, audio_prompt_path="funny_voice.wav")
ta.save("test-2.wav", wav, model.sr)
The default voice sounds quite sarcastic when you do this 😆
I'd be curious to hear any interesting outputs you create!
I'm also very interested with emotions (and other languages). Here's my result with exaggeration=1.4, cfg=0.5, temp=0.8
(I can't post flac as is, zipped) ComfyUI_temp_rnigc_00009_.flac.zip
As you can hear, it's still "reading the book". Please let us know when it's implemented.