distil-whisper icon indicating copy to clipboard operation
distil-whisper copied to clipboard

Can we use distil whisper for 50+ concurrent requests on one T4 machine without compromising on latency for each request?

Open moksh-samespace opened this issue 1 year ago • 1 comments

moksh-samespace avatar Nov 27 '23 12:11 moksh-samespace

For high batch sizes it is recommended to use newer hardware with more VRAM (e.g. an A100). The performance of T4 GPUs saturates quickly as you increase the VRAM, giving lower throughput at higher batch sizes. For details, see section D.5 of the Distil-Whisper paper (pages 29 and 30).

Screenshot 2023-11-27 at 19 46 27

sanchit-gandhi avatar Nov 28 '23 00:11 sanchit-gandhi