distil-whisper
distil-whisper copied to clipboard
Can we use distil whisper for 50+ concurrent requests on one T4 machine without compromising on latency for each request?
For high batch sizes it is recommended to use newer hardware with more VRAM (e.g. an A100). The performance of T4 GPUs saturates quickly as you increase the VRAM, giving lower throughput at higher batch sizes. For details, see section D.5 of the Distil-Whisper paper (pages 29 and 30).