WhisperSpeech icon indicating copy to clipboard operation
WhisperSpeech copied to clipboard

Output length for file generation

Open oariasmx opened this issue 1 year ago • 2 comments

Hello Is script developed to only have 30 seconds of conversion? my output files are cut to 30 sec, is there any way to change this limit?

Thanks

oariasmx avatar Jan 20 '24 01:01 oariasmx

Noticed the same and didn't find a way to change it. I wonder if it comes from the model (Whisper I think was trained on 30 second clips). I guess one can do shorter clips and fuse them together at least... if there's no better way.

odeemi avatar Jan 20 '24 15:01 odeemi

Yeah, right now the longest single generation can be 30 seconds. We are looking into allowing “speech continuations” where you feed the last 10 seconds or so to seamlessly generate another 20 (and so on).

Until this is implemented you can synthesize 30 second chunks and concatenation the audio.

If someone would like to automate this I’d love to merge a PR.

jpc avatar Jan 20 '24 17:01 jpc