whisper.cpp
whisper.cpp copied to clipboard
Port of OpenAI's Whisper model in C/C++
I tried to use Karaoke-style movie generation on my Chinese audio, then I got this: https://github.com/ggerganov/whisper.cpp/assets/125183026/b713a84a-86d6-4935-aeb0-76fe9142856a Full of 口s. So, can you add some Chinese fonts into the feature?
Setting up a new macbook pro, m2, added coreml, works great! Except with new trdz feature. running `./models/generate-coreml-model.sh small.en-tdrz` is missing from conversion script list of options. ``` Traceback (most...
- [x] Basic functionality - [x] Rewrite `whisper_wrap_segment` - [x] Rewrite L5717-L5805 - [x] ~Remove `print_realtime`~ This is too tricky - [x] Remove hallucination by using `token_nosp` - [x] Heuristic...
Hello! In some experiments, I've noticed that in audio files that have silence at the end (even ~1s of silence), whispercpp sometimes transcribes "bullshit" text from nonexistent speech. This _does...
[This commit](https://github.com/ggerganov/whisper.cpp/commit/2948c740a2bf43190b8e3badb6f1e147f11f96d1) breaks the compatibility with older CUDA versions, presumably < 11.1. The culprit is `cudaHostRegisterReadOnly` parameter that [is used](https://github.com/ggerganov/whisper.cpp/blob/fc366b807a17dc05813a6fcc13c8c4dfd442fa6a/ggml-cuda.cu#L2800) in `ggml-cuda.cu`, but was only introduced in CUDA 11.1, [if...
``` $./main -m models/ggml-large-v3-q5_0.bin -f output.wav -l auto whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-large-v3-q5_0.bin' whisper_model_load: loading model whisper_model_load: n_vocab = 51866 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 1280 whisper_model_load: n_audio_head...
I've tried: ``` # build using Emscripten git clone https://github.com/ggerganov/whisper.cpp cd whisper.cpp mkdir build-em && cd build-em emcmake cmake .. make -j # copy the produced page to your HTTP...
I've been using a script in terminal to transcribe 1-3 minute .wav files, and it's been really annoying, but perfectly accurate. every transcript flawless. Macwhisper, using the same "large" model,...
@ggerganov, I'm sorry to interrupt to you. it seems that there is a lack of a function returns version information. please reference this commit: https://github.com/zhouwg/kantv/commit/f2cf0a96aa9ba2b7066e44ba32487d17655854df or please reference Mozilla's DeepSpeech:...
Adds links to OPENVINO models. Closes #1893. Huggingface repository is now WIP.