chatterbox icon indicating copy to clipboard operation
chatterbox copied to clipboard

custom pronunciation

Open sankar-mukherjee opened this issue 6 months ago • 9 comments

This is char based tts. does it support custom pronunciation?

sankar-mukherjee avatar Jun 03 '25 07:06 sankar-mukherjee

I know for the screen reader app that I built as a Chrome extension, I have a tab in there for custom dictionaries. For example, John=Jon (like if the H was throwing off the tts system.) that. And I'm thinking about implementing that in my custom fork, but I haven't gotten a chance to do it yet. It's on the roadmap, though.

psdwizzard avatar Jun 03 '25 12:06 psdwizzard

This is char based tts. does it support custom pronunciation?

I've found that the TTS does pronounce things quite differently based on the reference audio. With a sample of David Attenborough's voice the output sounds like a british gentleman, with a sample of Donald Trump the output sounds like an american dumbass.

AznamirWoW avatar Jun 03 '25 21:06 AznamirWoW

@AznamirWoW While you're 100% right about accents, I have noticed that at least for XTTS2, all accents for the most part will pronounce certain things wrong, which is why in my Chrome extension I have a custom dictionary of pronunciations.

psdwizzard avatar Jun 03 '25 22:06 psdwizzard

Image

psdwizzard avatar Jun 03 '25 22:06 psdwizzard

@AznamirWoW While you're 100% right about accents, I have noticed that at least for XTTS2, all accents for the most part will pronounce certain things wrong, which is why in my Chrome extension I have a custom dictionary of pronunciations.

XTTS v2 has very good English, but the pronunciation does not really change when you change the speaker. Other languages perform somewhere between good and terrible.

Chatterbox has noticeable differences as said. Example: chatterbox.zip

AznamirWoW avatar Jun 03 '25 23:06 AznamirWoW

This is char based tts. does it support custom pronunciation?

I've found that the TTS does pronounce things quite differently based on the reference audio. With a sample of David Attenborough's voice the output sounds like a british gentleman, with a sample of Donald Trump the output sounds like an american dumbass.

So the output is accurate, nice.

MercyfulKing avatar Jun 05 '25 09:06 MercyfulKing

I've noticed in my own tests on the huggingface space that cloning with UK English accent works well. In my tests, I used Stephen Fry reading Harry Potter and it came out well. However, when using audio samples of Australian news reporters or samples of my own voice (I have Aus accent) for cloning, it came out with a strong UK English accent.

Are there ways to set parameters or tweak things to get Aus accent / different English accents?

tim-basic avatar Jun 17 '25 00:06 tim-basic

Actually, now I've just done aus accent tests with audio from here and it worked very well ... hmm

tim-basic avatar Jun 17 '25 01:06 tim-basic

hmm ... I've done some more tests using audio from here. It's a bit hit and miss. Seems to perform worse with female voices. It matches pitch and timbre okay but accent is lost.

tim-basic avatar Jun 17 '25 01:06 tim-basic