whisperX icon indicating copy to clipboard operation
whisperX copied to clipboard

Segments is too long compare to faster-whisper in the large-v2

Open yangxiaomin08 opened this issue 1 year ago • 1 comments

hi,

I run into two problems when using whisperX.

  1. I found the whipserX (model.transcribe) generates long segments pretty larger than faster-whipser. Is there any parameter can control the segments length?

  2. The first 5 seconds texts were missed in whisperX.

Thanks a lot.

Below is my code:

`device = "cuda" audio_file = "audio.mp3" batch_size = 16 # reduce if low on GPU mem compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)

model = whisperx.load_model("large-v2", device, compute_type=compute_type)

audio = whisperx.load_audio(audio_file)

result = model.transcribe(audio, batch_size=batch_size) t3 = time.time_ns() / 1000000 print(result["segments"]) # before alignment`

yangxiaomin08 avatar Feb 02 '24 11:02 yangxiaomin08

similar problem. HAVE you solved it?

MengHao666 avatar Mar 08 '24 10:03 MengHao666