distil-whisper High inference time when using chunk size 15

High inference time when using chunk size 15

Open shashikg opened this issue 1 year ago • 2 comments

Hi @sanchit-gandhi !

I'm in the process of integrating multiple whisper backends into a unified package that includes VAD-based chunking. During testing, I observed significantly higher inference times while using the HuggingFace pipeline with distil-whisper. You can find the details here: https://github.com/shashikg/WhisperS2T/releases/tag/v1.1.0 [A30 GPU]

Could you please review the benchmarking script I'm using? It's available at: https://github.com/shashikg/WhisperS2T/blob/main/scripts/benchmark_huggingface_distil.py

Thanks for your assistance!

Shashi

Dec 19 '23 01:12 shashikg

Hey @shashikg! Thanks for sharing these benchmarks! I've had a look through the code, there were two variables that we could maybe adjust:

num_workers: is there any reason we pin this to 1 data loader num worker here? We could pre-process our data faster if we left this as the default (8)
chunk_length_s: worth setting this to 15 in all instances, e.g. here

Dec 19 '23 14:12 sanchit-gandhi

Hey I think HF ChunkPipeline sets anything greater than num_worker>0 to num_worker=1. See here. Though I will once run the benchmark after setting this to a higher number.
That should not be an issue, for distil-whisper I only ran the benchmark on KINCAID WAV. this

Dec 19 '23 15:12 shashikg

distil-whisper distil-whisper copied to clipboard

High inference time when using chunk size 15

distil-whisper
distil-whisper copied to clipboard