WhisperSpeech
WhisperSpeech copied to clipboard
Long-Form Generation
Hi, Do you know if it's possible to smoothly generate longer audio with WhisperSpeech? And dialogue with multiple characters? Thanks!
Both these features should be possible if we implement #52
Nice! What about generating dialogue? Should that be done separately (ie quote extraction + attribution -> WhisperSpeech w/ voice cloning)?
Yes, you're right. This would give you the most control over the style and voice of each speaker.
One could probably find or train a traditional NLP preprocessing model to do this automatically.
Thanks! Should I close this issue in favor of #52?
These is also #58 , which would also be useful.
Maybe this can be the umbrella task. :)