WhisperSpeech Long-Form Generation

Long-Form Generation

Open fakerybakery opened this issue 1 year ago • 5 comments

trafficstars

Hi, Do you know if it's possible to smoothly generate longer audio with WhisperSpeech? And dialogue with multiple characters? Thanks!

Jan 19 '24 02:01 fakerybakery

Both these features should be possible if we implement #52

Jan 19 '24 10:01 jpc

Nice! What about generating dialogue? Should that be done separately (ie quote extraction + attribution -> WhisperSpeech w/ voice cloning)?

Jan 19 '24 16:01 fakerybakery

Yes, you're right. This would give you the most control over the style and voice of each speaker.

One could probably find or train a traditional NLP preprocessing model to do this automatically.

Jan 19 '24 16:01 jpc

Thanks! Should I close this issue in favor of #52?

Jan 19 '24 16:01 fakerybakery

These is also #58 , which would also be useful.

Maybe this can be the umbrella task. :)

Jan 22 '24 14:01 jpc