insanely-fast-whisper icon indicating copy to clipboard operation
insanely-fast-whisper copied to clipboard

[Benchmarking] Thorough benchmarking for Transformers!

Open Vaibhavs10 opened this issue 1 year ago • 1 comments

I am starting this issue to do a more thorough benchmarking than the notebooks used in the repo.

What should we measure:

  1. Time for generation
  2. Max GPU VRAM
  3. Accuracy

Hardware (this would give the best of both worlds IMO):

  1. Consumer (T4)
  2. A100s

Tricks that we should measure:

  1. scaled_dot_product_attention via BetterTransformers API in Optimum.
  2. Flash Attention 2
  3. Chunked batching via the pipeline API in Transformers
  4. Speculative Decoding

Models that we should test:

  1. openai/whisper-large-v3
  2. distil-whisper/distil-large-v2

Vaibhavs10 avatar Dec 01 '23 17:12 Vaibhavs10

Has this been finalized yet just out of curiosity?

BBC-Esq avatar Dec 28 '23 17:12 BBC-Esq