ltu
ltu copied to clipboard
why pad_or_trim use 1000 rather than 3000 when transcribe_audio?
why pad_or_trim use 1000 rather than 3000 when transcribe_audio?
mel = pad_or_trim(mel, 1000).to(model.device).to(dtype)