distil-whisper Can we use distil whisper for 50+ concurrent requests on one T4 machine without compromising on latency for each request?

Can we use distil whisper for 50+ concurrent requests on one T4 machine without compromising on latency for each request?

Open moksh-samespace opened this issue 1 year ago • 1 comments

Nov 27 '23 12:11 moksh-samespace

For high batch sizes it is recommended to use newer hardware with more VRAM (e.g. an A100). The performance of T4 GPUs saturates quickly as you increase the VRAM, giving lower throughput at higher batch sizes. For details, see section D.5 of the Distil-Whisper paper (pages 29 and 30).

Nov 28 '23 00:11 sanchit-gandhi

distil-whisper distil-whisper copied to clipboard

Can we use distil whisper for 50+ concurrent requests on one T4 machine without compromising on latency for each request?

distil-whisper
distil-whisper copied to clipboard