whisperX Segments is too long compare to faster-whisper in the large-v2

Segments is too long compare to faster-whisper in the large-v2

Open yangxiaomin08 opened this issue 1 year ago • 1 comments

hi,

I run into two problems when using whisperX.

I found the whipserX (model.transcribe) generates long segments pretty larger than faster-whipser. Is there any parameter can control the segments length?
The first 5 seconds texts were missed in whisperX.

Thanks a lot.

Below is my code:

`device = "cuda" audio_file = "audio.mp3" batch_size = 16 # reduce if low on GPU mem compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)

model = whisperx.load_model("large-v2", device, compute_type=compute_type)

audio = whisperx.load_audio(audio_file)

result = model.transcribe(audio, batch_size=batch_size) t3 = time.time_ns() / 1000000 print(result["segments"]) # before alignment`

Feb 02 '24 11:02 yangxiaomin08

whisperX whisperX copied to clipboard

Segments is too long compare to faster-whisper in the large-v2

Below is my code:

whisperX
whisperX copied to clipboard