Patrick von Platen comments

Results 1228 comments of


                                            Patrick von Platen

[Speculative Decoding] How to run speculative decoding for batch_size > 1?

Please read: https://github.com/huggingface/distil-whisper/issues/26#issuecomment-1805643512

Hopefully the large-v3 version will be supported

Would be cool to start a new distillation run for Whisper-large-v3 indeed! Let's see if we find some compute

Hopefully the large-v3 version will be supported

We mainly trained on TPUv4's here. @sanchit-gandhi will know best what hardware is needed I believe :-)

Compatibility with CTranslate2 / faster-whisper

The cross attention head dimensions should be **exactly** the same as the corresponding teacher models (which are whisper-large-v2 for distil-whisper-32-2 and whisper-medium.en for distil-whisper-24-2)

beamsize > 1

> @sanchit-gandhi Sincerely thank you for your reply. What I want to know is,how to deal with beamsize >1 in speculative decoding？When draft model generated 4 beams, for example, and...

beamsize > 1

Also see this issue: https://github.com/huggingface/distil-whisper/issues/11

beamsize > 1

@souvikqb, please open a new issue as this question is not related to `beamsize`

WIP: Adding training script

Wow amazing work here @isamu-isozaki! cc'ing @patil-suraj here as well

When using openpose, what is the format of the input image? RGB format, or BGR format?

It should be RGB format, see example here: https://huggingface.co/lllyasviel/sd-controlnet-canny#example

Add Face detector

Happy to review a PR!