WhisperSpeech icon indicating copy to clipboard operation
WhisperSpeech copied to clipboard

Long-Form Generation

Open fakerybakery opened this issue 1 year ago • 5 comments
trafficstars

Hi, Do you know if it's possible to smoothly generate longer audio with WhisperSpeech? And dialogue with multiple characters? Thanks!

fakerybakery avatar Jan 19 '24 02:01 fakerybakery

Both these features should be possible if we implement #52

jpc avatar Jan 19 '24 10:01 jpc

Nice! What about generating dialogue? Should that be done separately (ie quote extraction + attribution -> WhisperSpeech w/ voice cloning)?

fakerybakery avatar Jan 19 '24 16:01 fakerybakery

Yes, you're right. This would give you the most control over the style and voice of each speaker.

One could probably find or train a traditional NLP preprocessing model to do this automatically.

jpc avatar Jan 19 '24 16:01 jpc

Thanks! Should I close this issue in favor of #52?

fakerybakery avatar Jan 19 '24 16:01 fakerybakery

These is also #58 , which would also be useful.

Maybe this can be the umbrella task. :)

jpc avatar Jan 22 '24 14:01 jpc