WhisperSpeech icon indicating copy to clipboard operation
WhisperSpeech copied to clipboard

Feature request - add multiple speakers to repository

Open BBC-Esq opened this issue 1 year ago • 2 comments

The following website has a bunch of voices for Bark:

https://rsxdalv.github.io/bark-speaker-directory/

I was wondering if anyone had an interest in doing something similar for WhisperSpeech? Currently, to use anything except the default voice one has to obtain an audio file and properly add a parameter within custom code to extract the embeddings...then the voice is used.

The pipeline.py script currently hardcodes the default voice here:

image

Perhaps we can obtain multiple tensors of high quality voices and offer them as options for people, male, female, etc.? I'm willing to contribute but still haven't been able to accurately extract speaker embeddings and get the tensors...spent about 3 hours trying different ways.

Let's say we get a dozen high quality voices (i.e. tensors), perhaps include them in a configuration file or constants.py and allow people to choose among them - not removing the ability to create your own of course!

People could even post their voices in the tensor format in the "examples" folder, just brainstorming.

BBC-Esq avatar Feb 05 '24 19:02 BBC-Esq

This is actually quite easy to add – one needs to run a voice sample through the speechbrain model (example code is in pipeline.py) and copy the resulting weights to a file.

If we want to add some voices be default we could probably save all the vectors to huggingface in a single pth file (instead of pasting them into the source code). The tricky part is to find reference voices that are properly licensed. Maybe use a few samples from LibriTTS-R?

jpc avatar Feb 13 '24 10:02 jpc

Yep, that was my only concern, the licensing issue. One idea would be to use a file named constants.py and just keep adding voices that we've verified as high quality and there's no licensing issue?

BBC-Esq avatar Feb 13 '24 12:02 BBC-Esq