WhisperSpeech icon indicating copy to clipboard operation
WhisperSpeech copied to clipboard

multilanguage support

Open lucasjinreal opened this issue 1 year ago • 12 comments

Will support Mandrain?

lucasjinreal avatar Apr 17 '23 11:04 lucasjinreal

Hey, great question. Does Whisper work for Mandarin? I found https://github.com/openai/whisper/discussions/25 but it's seem inconclusive to me?

I'll test today how Whisper semantic tokens from an English only model behave when cloning speech in a different language.

jpc avatar Apr 17 '23 11:04 jpc

whisper stt support mandrain. doesn't know about tts.

I think tts would be a little bit harder.

lucasjinreal avatar Apr 17 '23 11:04 lucasjinreal

We plan to train another quantized semantic token model based on the multilingual Whisper medium model soon. Medium seems like a good quality/speed tradeoff that should improve the quality a lot.

If we had a good (speech-only) dataset for other languages we could add them in to get multi-lingual semantic tokens. This would open up a path to train a full multilingual TTS models.

jpc avatar Apr 20 '23 09:04 jpc

@jpc multilangual tts is an ambitious goal, for Mandrain, TBH, there is no very good open dataset. Biaobei (Baker) could be used as experiment.

lucasjinreal avatar Apr 20 '23 09:04 lucasjinreal

The demos in the readme are all trained on around 1000hours so we may can get something usable with this amount of data (and multiple languages may benefit from each other like in Whisper).

To add a language we mainly have to make sure Whisper is working good on it since this is what we use for back translation. I can do this verification for English and Polish. For other languages we need some help.

jpc avatar Apr 20 '23 09:04 jpc

If you look for some help validating french, I'll be glad to make this very small contribution.

olup avatar May 28 '23 05:05 olup

Hey, we now have an English + Polish model so the architecture is validated for other languages. Right now it looks like we need a few hundred hours of speech to fully support a new language although the number will probably drop the more languages we add.

We'll make a plan to help people contribute support for other languages.

jpc avatar Dec 20 '23 09:12 jpc

https://github.com/microsoft/SynapseML

novohool avatar Jan 18 '24 07:01 novohool

N part question regarding training a language dialect.

  • Is there a way to skip OpenAI Whisper encoder step to generate embeddings?
  • I have plenty of transcribed text but for dialects which are obviously not present in the Whisper model.
  • Is there a way to still train your model for another language dialect?
  • Is 4090 enough to train model?
  • Is there a recommended size of dataset to achieve good results with a new language?

aleksas avatar Jan 18 '24 16:01 aleksas

I currently have some voice data. How should I start training a new language model? I've read some documents in the readme and nbs, but couldn't find the training steps.

faceair avatar Jan 18 '24 16:01 faceair

This project looks really promising and I like the quality of the generated speech audios 😄

Looking forward to this one!

Binozo avatar Jan 18 '24 21:01 Binozo

I would like to add support for Hebrew too. OpenAI API of whisper already support tts for Hebrew. The only problem is that the speaker accent is american instead of Hebrew accent, but that's still usable!

  1. Can I use records of different speakers as train data and get another better final voice like you already did?
  2. How many hours of records should I grab?
  3. Do you have some information about how should I train it and eventually creating a PR for this repo?

thewh1teagle avatar Jan 20 '24 13:01 thewh1teagle