faster-whisper icon indicating copy to clipboard operation
faster-whisper copied to clipboard

beam_size size not working if using converted model

Open ewwink opened this issue 1 year ago • 3 comments

I want to use cahya/whisper-large-id

ct2-transformers-converter --model "cahya/whisper-large-id" \
--output_dir "cahya-whisper-large-id-ct2" --quantization float16

but changing beam_size has no effect, it always return 30 second segments, I want under 5 second

model = WhisperModel("cahya-whisper-large-id-ct2", device="cuda", compute_type="float16")
segments, _ = model.transcribe("voice.wav", beam_size=1, language="id") 
for segment in  list(segments):
    print("[%.2f -> %.2f] %s" % (segment.start, segment.end, segment.text))

ewwink avatar Mar 12 '24 14:03 ewwink

beam_size is not related to segments duration, it's size of beam search.

Purfview avatar Mar 12 '24 16:03 Purfview

I don't know, but it work with model large-v3

default image

with beam_size=1 image

ewwink avatar Mar 12 '24 17:03 ewwink

On some other audio you can observe opposite effect.

Purfview avatar Mar 12 '24 18:03 Purfview