dsnote icon indicating copy to clipboard operation
dsnote copied to clipboard

Custom Whisper Model

Open Faidros1 opened this issue 8 months ago • 2 comments

I regularly write in a "small" language (Swedish) and up until Whisper Large v3, I really didn't have any good way to do speech recognition. But as revolutionary as Whisper has been, for smaller languages it still leaves a lot to be desired. In the case of Swedish, our National Library, Kungliga Biblioteket, has released a Whisper model trained specifically on Swedish and it reduces the WER (Word Error Rate) by an average of 47 % compared to whisper-large-v3.

Here is a link to get more information:

https://huggingface.co/KBLab/kb-whisper-large

Any chance this could be made available within SpeechNote?

Faidros1 avatar Apr 09 '25 19:04 Faidros1

Hi. Thanks for letting me know about these models. I wish every national library would do something similar.

I don't speak Swedish, but I have tested on a reference audio sample and even the "Tiny" model is quite capable. Really impressive.

I added them as "KBLab" models in f649df08139b56287b90d620c8f9503de31766b2. Both WhisperCpp and FasterWhisper.

If you don't want to wait for the next version of Speech Note, you can enable them manually by changing the models.json file. To do so, edit ~/.var/app/net.mkiol.SpeechNote/data/net.mkiol/dsnote/models.json and add the following (e.g. as the last models):

        {
            "name": "Svenska (FasterWhisper KBLab Tiny)",
            "model_id": "sv_fasterwhisper_kblab_tiny",
            "engine": "stt_fasterwhisper",
            "lang_id": "sv",
            "checksum": "94b5299a",
            "checksum_quick": "51a9f986",
            "size": "80549811",
            "comp": "dir",
            "urls": [
                "https://huggingface.co/KBLab/kb-whisper-tiny/resolve/fb77c9949fde44d50255f6462f70c6d67621af11/model.bin",
                "https://huggingface.co/KBLab/kb-whisper-tiny/resolve/fb77c9949fde44d50255f6462f70c6d67621af11/config.json",
                "https://huggingface.co/KBLab/kb-whisper-tiny/resolve/fb77c9949fde44d50255f6462f70c6d67621af11/tokenizer.json",
                "https://huggingface.co/KBLab/kb-whisper-tiny/resolve/fb77c9949fde44d50255f6462f70c6d67621af11/vocabulary.json",
                "https://huggingface.co/KBLab/kb-whisper-tiny/resolve/fb77c9949fde44d50255f6462f70c6d67621af11/preprocessor_config.json"
            ]
        },
        {
            "name": "Svenska (FasterWhisper KBLab Base)",
            "model_id": "sv_fasterwhisper_kblab_base",
            "engine": "stt_fasterwhisper",
            "lang_id": "sv",
            "checksum": "3569ac76",
            "checksum_quick": "d3683759",
            "size": "150229133",
            "comp": "dir",
            "urls": [
                "https://huggingface.co/KBLab/kb-whisper-base/resolve/35e1b469c4241867835daf57254ade4bed1f1d4c/model.bin",
                "https://huggingface.co/KBLab/kb-whisper-base/resolve/35e1b469c4241867835daf57254ade4bed1f1d4c/config.json",
                "https://huggingface.co/KBLab/kb-whisper-base/resolve/35e1b469c4241867835daf57254ade4bed1f1d4c/tokenizer.json",
                "https://huggingface.co/KBLab/kb-whisper-base/resolve/35e1b469c4241867835daf57254ade4bed1f1d4c/vocabulary.json",
                "https://huggingface.co/KBLab/kb-whisper-base/resolve/35e1b469c4241867835daf57254ade4bed1f1d4c/preprocessor_config.json"
            ]
        },
        {
            "name": "Svenska (FasterWhisper KBLab Small)",
            "model_id": "sv_fasterwhisper_kblab_small",
            "engine": "stt_fasterwhisper",
            "lang_id": "sv",
            "checksum": "30e7cb70",
            "checksum_quick": "829002d0",
            "size": "488558571",
            "comp": "dir",
            "urls": [
                "https://huggingface.co/KBLab/kb-whisper-small/resolve/f516f51f3cb3782e28d41a22ccd1cd7df17ee515/model.bin",
                "https://huggingface.co/KBLab/kb-whisper-small/resolve/f516f51f3cb3782e28d41a22ccd1cd7df17ee515/config.json",
                "https://huggingface.co/KBLab/kb-whisper-small/resolve/f516f51f3cb3782e28d41a22ccd1cd7df17ee515/tokenizer.json",
                "https://huggingface.co/KBLab/kb-whisper-small/resolve/f516f51f3cb3782e28d41a22ccd1cd7df17ee515/vocabulary.json",
                "https://huggingface.co/KBLab/kb-whisper-small/resolve/f516f51f3cb3782e28d41a22ccd1cd7df17ee515/preprocessor_config.json"
            ]
        },
        {
            "name": "Svenska (FasterWhisper KBLab Medium)",
            "model_id": "sv_fasterwhisper_kblab_medium",
            "engine": "stt_fasterwhisper",
            "lang_id": "sv",
            "checksum": "e9583557",
            "checksum_quick": "bd9bd507",
            "size": "1532917950",
            "comp": "dir",
            "urls": [
                "https://huggingface.co/KBLab/kb-whisper-medium/resolve/1951aa1bb411016e15023815d039da9425f1ec5a/model.bin",
                "https://huggingface.co/KBLab/kb-whisper-medium/resolve/1951aa1bb411016e15023815d039da9425f1ec5a/config.json",
                "https://huggingface.co/KBLab/kb-whisper-medium/resolve/1951aa1bb411016e15023815d039da9425f1ec5a/tokenizer.json",
                "https://huggingface.co/KBLab/kb-whisper-medium/resolve/1951aa1bb411016e15023815d039da9425f1ec5a/vocabulary.json",
                "https://huggingface.co/KBLab/kb-whisper-medium/resolve/1951aa1bb411016e15023815d039da9425f1ec5a/preprocessor_config.json"
            ]
        },
        {
            "name": "Svenska (FasterWhisper KBLab Large)",
            "model_id": "sv_fasterwhisper_kblab_large",
            "engine": "stt_fasterwhisper",
            "lang_id": "sv",
            "checksum": "e3c88aeb",
            "checksum_quick": "faeebbe6",
            "size": "3092296021",
            "comp": "dir",
            "urls": [
                "https://huggingface.co/KBLab/kb-whisper-large/resolve/33cee585905bd2f817274d5a88e65ce3e8fcedb0/model.bin",
                "https://huggingface.co/KBLab/kb-whisper-large/resolve/33cee585905bd2f817274d5a88e65ce3e8fcedb0/config.json",
                "https://huggingface.co/KBLab/kb-whisper-large/resolve/33cee585905bd2f817274d5a88e65ce3e8fcedb0/tokenizer.json",
                "https://huggingface.co/KBLab/kb-whisper-large/resolve/33cee585905bd2f817274d5a88e65ce3e8fcedb0/vocabulary.json",
                "https://huggingface.co/KBLab/kb-whisper-large/resolve/33cee585905bd2f817274d5a88e65ce3e8fcedb0/preprocessor_config.json"
            ]
        },
        {
            "name": "Svenska (WhisperCpp KBLab Tiny)",
            "model_id": "sv_whisper_kblab_tiny",
            "engine": "stt_whisper",
            "lang_id": "sv",
            "checksum": "4f3e9de4",
            "checksum_quick": "15b18cb2",
            "size": "29883930",
            "urls": [
                "https://huggingface.co/KBLab/kb-whisper-tiny/resolve/fb77c9949fde44d50255f6462f70c6d67621af11/ggml-model-q5_0.bin"
            ]
        },
        {
            "name": "Svenska (WhisperCpp KBLab Base)",
            "model_id": "sv_whisper_kblab_base",
            "engine": "stt_whisper",
            "lang_id": "sv",
            "checksum": "daca9acb",
            "checksum_quick": "4e7bca22",
            "size": "55303642",
            "urls": [
                "https://huggingface.co/KBLab/kb-whisper-base/resolve/35e1b469c4241867835daf57254ade4bed1f1d4c/ggml-model-q5_0.bin"
            ]
        },
        {
            "name": "Svenska (WhisperCpp KBLab Small)",
            "model_id": "sv_whisper_kblab_small",
            "engine": "stt_whisper",
            "lang_id": "sv",
            "checksum": "702785de",
            "checksum_quick": "4c18184c",
            "size": "175217872",
            "urls": [
                "https://huggingface.co/KBLab/kb-whisper-small/resolve/f516f51f3cb3782e28d41a22ccd1cd7df17ee515/ggml-model-q5_0.bin"
            ]
        },
        {
            "name": "Svenska (WhisperCpp KBLab Medium)",
            "model_id": "sv_whisper_kblab_medium",
            "engine": "stt_whisper",
            "lang_id": "sv",
            "checksum": "616d248a",
            "checksum_quick": "6da49dbe",
            "size": "539220676",
            "urls": [
                "https://huggingface.co/KBLab/kb-whisper-medium/resolve/1951aa1bb411016e15023815d039da9425f1ec5a/ggml-model-q5_0.bin"
            ]
        },
        {
            "name": "Svenska (WhisperCpp KBLab Large)",
            "model_id": "sv_whisper_kblab_large",
            "engine": "stt_whisper",
            "lang_id": "sv",
            "checksum": "34e3bd28",
            "checksum_quick": "e822f64d",
            "size": "1081148395",
            "urls": [
                "https://huggingface.co/KBLab/kb-whisper-large/resolve/33cee585905bd2f817274d5a88e65ce3e8fcedb0/ggml-model-q5_0.bin"
            ]
        }

Make sure the JSON formatting is correct.

After restarting the app, you should be able to download the "KBLab" models.

mkiol avatar Apr 12 '25 17:04 mkiol

Wonderful! Thank you. I've changed my models.json file and downloaded the large model. Looking forward to testing!

Faidros1 avatar Apr 13 '25 07:04 Faidros1

New version 4.8.0 is out and all KBLab Whisper models are included.

f649df08139b56287b90d620c8f9503de31766b2

mkiol avatar Jun 21 '25 10:06 mkiol