faster-whisper icon indicating copy to clipboard operation
faster-whisper copied to clipboard

Shorter segments?

Open ronyfadel opened this issue 2 years ago • 5 comments

Would it be possible to produce shorter segments? (some are way too long)

ronyfadel avatar Feb 22 '23 07:02 ronyfadel

There is no option that can effectively prevent this. The parameter length_penalty can help to some extent but it will not force the model to predict a shorter segment.

Do you get a different output with openai/whisper? If yes, it would be great if you can provide a way to reproduce the output.

guillaumekln avatar Feb 22 '23 09:02 guillaumekln

There's been discussions in openai/whisper where you could skew the model to output shorter segments by tweaking max_text_token_logprob: https://github.com/openai/whisper/discussions/435#discussioncomment-4010615

Is something similar with the codebase in faster-whisper?

ronyfadel avatar Feb 22 '23 17:02 ronyfadel

I just saw the addition of length_penalty today. How should it be used? Its default value is set to 1.

ronyfadel avatar Feb 22 '23 19:02 ronyfadel

@guillaumekln from my testing, I've also had great results using the token_timestamps flag here

Tbh, I don't know what CTranslate2 does to the underlying model, and if such capabilities are lost because the model was transformed.

ronyfadel avatar Feb 23 '23 07:02 ronyfadel

At this time we did not implement any features or parameters that are not available in the reference implementation from openai/whisper. So currently there are no easy ways for users to tweak max_text_token_logprob or enable token-level timestamps, which would require changes to the C++ implementation in CTranslate2.

Regarding word-level timestamps, I'm following this development in the openai/whisper repo. If it is merged, I will look to support it here as well.

Also, you can ignore my comment regarding length_penalty. It is not relevant to your issue since you want the model to output more timestamps and not make the generated sequences shorter.

guillaumekln avatar Feb 23 '23 09:02 guillaumekln

I just merged the word-level timestamps branch so the segments can now be as short as you want.

guillaumekln avatar Mar 15 '23 14:03 guillaumekln

hi @guillaumekln do you mind explaining what you mean by "I just merged the word-level timestamps branch so the segments can now be as short as you want."?

How do we control their length now?

And why a couple of months after this reply you said here https://github.com/SYSTRAN/faster-whisper/issues/452#issuecomment-1704859269 that "There is no option to control the segment length."?

stephanedebove avatar Jun 20 '24 22:06 stephanedebove