LocalAI failed to load model with internal loader: grpc service not ready

LocalAI version: v2.29.0

Environment, CPU architecture, OS, and Version: Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-142-generic x86_64)

Describe the bug Occurs when I prompt model to generate image, also text generation not working

To Reproduce

Expected behavior

Logs

Additional context

Jul 02 '25 12:07 Chard0212

I have the same issue. A reboot helped but it keeps popping up sometimes.

If that occurs then this is looping until I stop the process:

11:48AM DBG Loading from the following backends (in order): [llama-cpp llama-cpp-fallback silero-vad whisper huggingface]
11:48AM INF Trying to load the model 'gemma-3-4b-it-qat' with the backend '[llama-cpp llama-cpp-fallback silero-vad whisper huggingface]'
11:48AM INF [llama-cpp] Attempting to load
11:48AM INF BackendLoader starting backend=llama-cpp modelID=gemma-3-4b-it-qat o.model=google_gemma-3-4b-it-qat-Q4_0.gguf
11:48AM DBG Loading model in memory from file: /users/xyz/localai/models/google_gemma-3-4b-it-qat-Q4_0.gguf
11:48AM DBG Loading Model gemma-3-4b-it-qat with gRPC (file: /users/xyz/localai/models/google_gemma-3-4b-it-qat-Q4_0.gguf) (backend: llama-cpp): {backendString:llama-cpp model:google_gemma-3-4b-it-qat-Q4_0.gguf modelID:gemma-3-4b-it-qat assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0x1400a108008 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
11:48AM DBG [llama-cpp-fallback] llama-cpp variant available
11:48AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback
11:48AM DBG GRPC Service for gemma-3-4b-it-qat will be running at: '127.0.0.1:50660'
11:48AM DBG GRPC Service state dir: /tmp/go-processmanager1939193735
11:48AM DBG GRPC Service Started
11:48AM DBG Wait for the service to start up
11:48AM DBG Options: ContextSize:8192 Seed:1311559892 NBatch:512 F16Memory:true MMap:true NGPULayers:256 Threads:10 MMProj:"mmproj-google_gemma-3-4b-it-qat-f16.gguf"
11:49AM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:50660: connect: connection refused\""
11:49AM DBG GRPC Service NOT ready
11:49AM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: grpc service not ready

Jul 10 '25 10:07 johndev168

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Nov 11 '25 02:11 github-actions[bot]

I have this issue, but only when loading some mistral based models, like mistralai_magistral-small-2509 (mistralai_Magistral-Small-2509-Q4_K_M.gguf) or mistral-2x24b-moe-power-coder-magistral-devstral-reasoning-ultimate-neo-max-44b (Mistral-2x24B-MOE-Pwr-Magis-Devstl-Reason-Ult-44B-NEO-D_AU-Q4_K_M.gguf)

Running LocalAI in a docker container under TrueNAS (version: 1.0.2, App version: 3.7.0) AMD CPU, 64GB RAM for the container as limit.

Nov 11 '25 19:11 rcfa

Seems to be a tool calling issue, because that message seems to be generated by the model itself.

Dec 04 '25 08:12 Splarkszter