insanely-fast-whisper
insanely-fast-whisper copied to clipboard
[Benchmarking] Thorough benchmarking for Transformers!
I am starting this issue to do a more thorough benchmarking than the notebooks used in the repo.
What should we measure:
- Time for generation
- Max GPU VRAM
- Accuracy
Hardware (this would give the best of both worlds IMO):
- Consumer (T4)
- A100s
Tricks that we should measure:
-
scaled_dot_product_attention
via BetterTransformers API in Optimum. - Flash Attention 2
- Chunked batching via the pipeline API in Transformers
- Speculative Decoding
Models that we should test:
Has this been finalized yet just out of curiosity?