LocalAI whisper: CrisperWhisper results in grpc: error while marshaling: string field contains invalid UTF-8

LocalAI version: localai/localai:v2.26.0-aio-gpu-nvidia-cuda-12

Environment, CPU architecture, OS, and Version:

uname -a
Linux gpu2 6.13.7-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 13 Mar 2025 18:12:00 +0000 x86_64 GNU/Linux

Describe the bug

Using https://huggingface.co/nyrahealth/CrisperWhisper with local-ai resullts in

Whisper-Error: 500 - {"error":{"code":500,"message":"rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8","type":""}}

To Reproduce

create directories and install dependencies

mkdir CrisperWhisper
mkdir CrisperWhisper-out
pip install huggingface_hub  torch numpy transformers
git clone https://github.com/openai/whisper
git clone https://github.com/ggerganov/whisper.cpp

Download the model

from huggingface_hub import snapshot_download, login

HUGGINGFACE_TOKEN = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

login(token=HUGGINGFACE_TOKEN)

model_id = "nyrahealth/CrisperWhisper"  # Replace with the ID of the model you want to download
snapshot_download(repo_id=model_id, local_dir="CrisperWhisper")

convert model to single file ggml

python whisper.cpp/models/convert-h5-to-ggml.py CrisperWhisper/ whisper/ CrisperWhisper-out/

move CrisperWhisper-out/ ggm file to your local-ai model path.

Expected behavior

Transcribe succeeded without any errors.

Logs

7:53AM INF BackendLoader starting backend=whisper modelID=CrisperWhisper.bin o.model=CrisperWhisper.bin
7:54AM INF Success ip=127.0.0.1 latency="28.449µs" method=GET status=200 url=/readyz
7:55AM INF Success ip=127.0.0.1 latency="14.826µs" method=GET status=200 url=/readyz
7:56AM ERR Server error error="rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8" ip=172.17.0.1 latency=2m39.930538614s method=POST status=500 url=/v1/audio/transcriptions

Additional context

Mar 19 '25 08:03 markuman

Using CPU-based local-ai results in the same error

quay.io/go-skynet/local-ai:v2.26.0-aio-cpu

Linux mb 6.8.0-55-generic #57-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 12 23:42:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Mar 19 '25 09:03 markuman

Hi - can you please share logs with --debug ?

also, can you try to set a model name without the "."? I don't think it's a problem per-se, but the UTF-8 error is unexpected. Can you also share how are you calling the API?

Mar 19 '25 10:03 mudler

Hi - can you please share logs with --debug ?

11:48AM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
11:49AM DBG context local model name not found, setting to the first model first model name=whisper-1
11:49AM DBG guessDefaultsFromFile: not a GGUF file filePath=/build/models/CrisperWhisper.bin
11:49AM DBG Audio file copied to: /tmp/whisper4121787727/test.mp3
11:49AM INF BackendLoader starting backend=whisper modelID=CrisperWhisper.bin o.model=CrisperWhisper.bin
11:49AM DBG Loading model in memory from file: /build/models/CrisperWhisper.bin
11:49AM DBG Loading Model CrisperWhisper.bin with gRPC (file: /build/models/CrisperWhisper.bin) (backend: whisper): {backendString:whisper model:CrisperWhisper.bin modelID:CrisperWhisper.bin assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0005de008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
11:49AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper
11:49AM DBG GRPC Service for CrisperWhisper.bin will be running at: '127.0.0.1:44969'
11:49AM DBG GRPC Service state dir: /tmp/go-processmanager771897440
11:49AM DBG GRPC Service Started
11:49AM DBG Wait for the service to start up
11:49AM DBG Options: ContextSize:512  Seed:1365369429  NBatch:512  MMap:true  NGPULayers:99999999  Threads:8
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr 2025/03/19 11:49:18 gRPC Server listening at 127.0.0.1:44969
11:49AM DBG GRPC Service Ready
11:49AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc000336e58} sizeCache:0 unknownFields:[] Model:CrisperWhisper.bin ContextSize:512 Seed:1365369429 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/CrisperWhisper.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_from_file_with_params_no_state: loading model from '/build/models/CrisperWhisper.bin'
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_with_params_no_state: use gpu    = 1
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_with_params_no_state: flash attn = 0
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_with_params_no_state: gpu_device = 0
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_with_params_no_state: dtw        = 0
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_with_params_no_state: backends   = 1
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: loading model
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: n_vocab       = 51866
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: n_audio_ctx   = 1500
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: n_audio_state = 1280
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: n_audio_head  = 20
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: n_audio_layer = 32
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: n_text_ctx    = 448
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: n_text_state  = 1280
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: n_text_head   = 20
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: n_text_layer  = 32
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: n_mels        = 128
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: ftype         = 1
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: qntvr         = 0
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: type          = 5 (large v3)
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: adding 6800 extra tokens
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: n_langs       = 100
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load:      CPU total size =  3094.36 MB
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_model_load: model size    = 3094.36 MB
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_state: kv self size  =   83.89 MB
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_state: kv cross size =  251.66 MB
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_state: kv pad  size  =    7.86 MB
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_state: compute buffer (conv)   =   36.13 MB
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_state: compute buffer (encode) =  212.29 MB
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_state: compute buffer (cross)  =    9.25 MB
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_init_state: compute buffer (decode) =   99.10 MB
11:49AM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr whisper_full_with_state: auto-detected language: de (p = 0.999483)
12:23PM DBG GRPC(CrisperWhisper.bin-127.0.0.1:44969): stderr 2025/03/19 12:23:14 ERROR: [core] [Server #1]grpc: server failed to encode response: rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8
12:23PM ERR Server error error="rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8" ip=10.0.2.100 latency=33m56.389020712s method=POST status=500 url=/v1/audio/transcriptions

also, can you try to set a model name without the "."?

hmm I don't understand this.

curl -s http://localhost:8080/models|jq|grep -i -A 2 -B 2 cris 
    },
    {
      "id": "CrisperWhisper.bin",
      "object": "model"
    },

That's also just the name in the filesystem

ls -lh localai/models/ |grep -i cr
-rw-rw---- 1 markuman markuman 2.9G Mar 17 07:58 CrisperWhisper.bin

Can you also share how are you calling the API?

import requests
import json


baseurl = "http://127.0.0.1:8080"
transcription = '/v1/audio/transcriptions'

testfile = 'test.mp3'

# transcription with whisper
############################
with open(testfile, "rb") as audio_file:
    files = {"file": ("test.mp3", audio_file)}
    data = {"model": "CrisperWhisper.bin"}
    response = requests.post(baseurl + transcription, files=files, data=data)

print(response)

if response.status_code == 200:
    raw = response.json().get('text')
else:
    print(f"Whisper-Error: {response.status_code} - {response.text}")

print(raw)

Mar 19 '25 12:03 markuman

I'm experiencing the very same issue. Happens only for CrisperWhisper model, all other Whisper models I tried so far just work fine. Any further details I can provide to debug this? I'd love to get CrisperWhisper running...

local-ai Version: v2.28.0

For reproducing the issue faster (w/o model conversion): ggml model here: https://huggingface.co/nyrahealth/CrisperWhisper/commit/0c039779bd37fc1fdd2bbaccaa02dbda7aac37d5#d2h-238772

Apr 25 '25 18:04 mjess

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Aug 14 '25 02:08 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Aug 20 '25 02:08 github-actions[bot]