faster-whisper icon indicating copy to clipboard operation
faster-whisper copied to clipboard

Converting fine-tuned whisper model for faster-whisper, using safetensors

Open ErfolgreichCharismatisch opened this issue 8 months ago • 5 comments

How can I convert https://huggingface.co/primeline/whisper-large-v3-german to be used with faster-whisper?

Also, can faster-whisper use safetensors and can I convert the above to it?

EDIT: When using

ct2-transformers-converter --model primeline/whisper-large-v3-german --output_dir whisper-large-v3-german --copy_files tokenizer.json --quantization float16

I get

ValueError: Non-consecutive added token '<|0.02|>' found. Should have index 50365 but has index 50366 in saved vocabulary

After upgrading ctranslate and transformers it works.

I got it to work with model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128)

Yet the model stops after about 20 words, whereas the large-v2 does the whole file. No error message, but a freezing tqdm as step 1 to skip to the end that I wrapped around for segment in segments: