Patrick von Platen
Patrick von Platen
Please read: https://github.com/huggingface/distil-whisper/issues/26#issuecomment-1805643512
Would be cool to start a new distillation run for Whisper-large-v3 indeed! Let's see if we find some compute
We mainly trained on TPUv4's here. @sanchit-gandhi will know best what hardware is needed I believe :-)
The cross attention head dimensions should be **exactly** the same as the corresponding teacher models (which are whisper-large-v2 for distil-whisper-32-2 and whisper-medium.en for distil-whisper-24-2)
> @sanchit-gandhi Sincerely thank you for your reply. What I want to know is,how to deal with beamsize >1 in speculative decoding?When draft model generated 4 beams, for example, and...
Also see this issue: https://github.com/huggingface/distil-whisper/issues/11
@souvikqb, please open a new issue as this question is not related to `beamsize`
Wow amazing work here @isamu-isozaki! cc'ing @patil-suraj here as well
It should be RGB format, see example here: https://huggingface.co/lllyasviel/sd-controlnet-canny#example
Happy to review a PR!