Max Bain comments

Results 55 comments of


                                            Max Bain

Issue with word-level timestamp with WhisperX V2

hey, so you dont need `--word_timestamps True` this is for the original whisper model. You could play with these to see how it compares, but currently the code doesn't parse...

Issue with word-level timestamp with WhisperX V2

todo (for self): documentation to explain `--word_timestamps` for OG whisper parse OG whisper word_timestamps for outputting to ~~word.srt~~ "srt-word" for comparison

Issue with word-level timestamp with WhisperX V2

@pdahiya --highlight_words True outputs srt with highlighted words

Issue with word-level timestamp with WhisperX V2

You can also use the json to generate your own, I will add .ass back at some point

More efficient batch inference resulting in large-v2 with *60-70x REAL TIME speed (now in custom v3 branch, see comment for details)

For ease of use, we decided to just import openai's whisper implementation for transcription stage, which doesnt support batching. The one in the previous commit has some accuracy issues which...

More efficient batch inference resulting in large-v2 with *60-70x REAL TIME speed (now in custom v3 branch, see comment for details)

@mezaros although it may be a disappointment to you, this repo is intended for research purposes and all the algorithms and pipelines in the paper have been open-sourced. But thank...

More efficient batch inference resulting in large-v2 with *60-70x REAL TIME speed (now in custom v3 branch, see comment for details)

Thanks, have you tried the faster-whisper drop in mentioned above? This should give you a ~4-5x speed-up. > Also, did you end up publishing an updated or final version of...

More efficient batch inference resulting in large-v2 with *60-70x REAL TIME speed (now in custom v3 branch, see comment for details)

I see, I can look into adding faster whisper as an optional import when i have some time (I just dont want to force it since it needs specific cuda/cudnn...

More efficient batch inference resulting in large-v2 with *60-70x REAL TIME speed (now in custom v3 branch, see comment for details)

Update, I did some speed benchmarking on GPU, faster-whisper is good it seems, and pretty fast all things considered **Model details** whisper_arch: large-v2 beam_size: 5 **Speed benchmark:** File name: DanielKahneman_2010.wav...

More efficient batch inference resulting in large-v2 with *60-70x REAL TIME speed (now in custom v3 branch, see comment for details)

FP16, without VAD