faster-whisper
faster-whisper copied to clipboard
Converting fine-tuned whisper model for faster-whisper, using safetensors
How can I convert https://huggingface.co/primeline/whisper-large-v3-german to be used with faster-whisper?
Also, can faster-whisper use safetensors and can I convert the above to it?
EDIT: When using
ct2-transformers-converter --model primeline/whisper-large-v3-german --output_dir whisper-large-v3-german --copy_files tokenizer.json --quantization float16
I get
ValueError: Non-consecutive added token '<|0.02|>' found. Should have index 50365 but has index 50366 in saved vocabulary
After upgrading ctranslate and transformers it works.
I got it to work with model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128)
Yet the model stops after about 20 words, whereas the large-v2 does the whole file. No error message, but a freezing tqdm as step 1 to skip to the end that I wrapped around for segment in segments: