whisper icon indicating copy to clipboard operation
whisper copied to clipboard

[README] Add section on 🤗 Transformers

Open sanchit-gandhi opened this issue 1 year ago • 3 comments

First of all, thank you for your amazing work on the Whisper project and for open-sourcing the family of pre-trained checkpoints - these are of tremendous benefit to both the open-source/open-science communities!

Whisper was added to Hugging Face Transformers in v4.23.1 in PyTorch and Tensorflow, making it more accessible to the research community. Since then, usage has been steadily increasing, with model downloads now at ~30k / month. Adding a section on the 🤗 Transformers implementation of Whisper to the README would help highlight how one can evaluate Whisper in ~10 lines of code! We're excited to share with you how the community adopt Whisper for ASR research and production!

sanchit-gandhi avatar Nov 04 '22 15:11 sanchit-gandhi

First of all, thank you for your amazing work on the Whisper project and for open-sourcing the family of pre-trained checkpoints - these are of tremendous benefit to both the open-source/open-science communities!

Whisper was added to Hugging Face Transformers in v4.23.1 in PyTorch and Tensorflow, making it more accessible to the research community. Since then, usage has been steadily increasing, with model downloads now at ~30k / month. Adding a section on the 🤗 Transformers implementation of Whisper to the README would help highlight how one can evaluate Whisper in ~10 lines of code! We're excited to share with you how the community adopt Whisper for ASR research and production!

this is so cool! Can I ask some questions here around the model you're hosting?

  • the real time speech isnt working on the site, is that a limitation of your whisper (just like this whisper) or just the website?
  • was this model automatically converted through torchscript/onnx, or was the architecture re-defined in TF so the weights could be transferred?
  • have you benchmarked latency of the tensorflow model against the original?
  • does the TF model support dynamic inputs or are the inputs a fixed size with padding when needed?
  • are there any plans to release a tflite model?

ameenba avatar Nov 05 '22 20:11 ameenba

Hey @ameenba!

the real time speech isnt working on the site

We're first going to add chunked ASR, after which we can start to look into real time (https://github.com/huggingface/transformers/issues/19887)

was this model automatically converted through torchscript/onnx, or was the architecture re-defined in TF

Re-defined, see https://github.com/huggingface/transformers/pull/19378

have you benchmarked latency of the tensorflow model against the original?

Unsure if this has been done! cc @amyeroberts

does the TF model support dynamic inputs or are the inputs a fixed size with padding when needed?

Again, cc the TF expert @amyeroberts!

sanchit-gandhi avatar Nov 07 '22 11:11 sanchit-gandhi