community-events icon indicating copy to clipboard operation
community-events copied to clipboard

WhisperPositionalEmbedding

Open Macsim2 opened this issue 1 year ago • 1 comments

Hi there I'm trying to fintune whisper model but there is a problem that decoder positional embedding size(small model case is [448,768]) should not bigger than 448(first dim) I have two question Q1) When I use a 10 second or more long wav file, that problem let stop training.. is it problem related to file size..?

prob code line is below

        # embed positions
        positions = self.embed_positions(input_ids, past_key_values_length=past_key_values_length)

        hidden_states = inputs_embeds + positions

in transformers/models/whisper/modeling_whisper.py:872 is stopped line if I change the max_target_positions then I use random embedding layer instead existing whisper's embedding layer.. Q2) let me know any solution..?

Macsim2 avatar May 09 '23 02:05 Macsim2

Hey @Macsim2! Best thing to do here is to filter any transcriptions longer than Whisper's max length, see https://discuss.huggingface.co/t/open-to-the-community-whisper-fine-tuning-event/26681/21?u=sanchit-gandhi

sanchit-gandhi avatar Dec 07 '23 13:12 sanchit-gandhi