LocalAI LOCALAI_SINGLE_ACTIVE_BACKEND=true prevents any back-ends from running at all.

LocalAI version:

localai/localai:v3.7.0-gpu-nvidia-cuda-12

Environment, CPU architecture, OS, and Version:

Linux command-center 6.14.0-33-generic #33~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 19 17:02:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug

Loading a model into memory using the built in chat interface appears to hang. No model ever gets loaded into memory regardless of backend used when I set LOCALAI_LOG_LOCALAI_SINGLE_ACTIVE_BACKEND=true To Reproduce

Start the docker container with the environment variable set to true.

$SUDO docker run -v local-ai-data:/models \
            --gpus all \
            --restart=always \
            -e API_KEY=$API_KEY \
            -e THREADS=$THREADS \
            -e LOCALAI_LOG_LEVEL=debug \
            -e LOCALAI_SINGLE_ACTIVE_BACKEND=true \
            $envs \
            -d -p $PORT:8080 --name local-ai localai/localai:$IMAGE_TAG $STARTCOMMAND

Expected behavior

Models should load into memory. Backend should shut down if a model requiring another backend is requested and the backend is idle. Logs

7:14PM DBG guessDefaultsFromFile: template already set name=shuttleai_shuttle-3.5 7:14PM DBG Chat endpoint configuration read: &{modelConfigFile:/models/shuttleai_shuttle-3.5.yaml PredictionOptions:{BasicModelRequest:{Model:shuttleai_shuttle-3.5-Q4_K_M.gguf} Language: Translate:false N:0 TopP:0xc001394ed0 TopK:0xc001394ed8 Temperature:0xc001394ef0 Maxtokens:0xc001394f60 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc001394f48 TypicalP:0xc001394f40 Seed:0xc001394f90 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:shuttleai_shuttle-3.5 F16:0xc001394e70 Threads:0xc001394eb0 Debug:0xc001e710c0 Roles:map[] Embeddings:0xc001394f69 Backend:llama-cpp TemplateConfig:{Chat:{{.Input -}} <|im_start|>assistant ChatMessage:<|im_start|>{{ .RoleName }} {{ if .FunctionCall -}} {{ else if eq .RoleName "tool" -}} {{ end -}} {{ if .Content -}} {{.Content }} {{ end -}} {{ if .FunctionCall -}} {{toJson .FunctionCall}} {{ end -}}<|im_end|> Completion:{{.Input}} Edit: Functions:<|im_start|>system You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: {{range .Functions}} {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }} {{end}} For each function call return a json object with function name and arguments <|im_end|> {{.Input -}} <|im_start|>assistant UseTokenizerTemplate:false JoinChatMessagesByCharacter: Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_ANY FLAG_CHAT FLAG_COMPLETION] KnownUsecases: Pipeline:{TTS: LLM: Transcription: VAD:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc001394f18 MirostatTAU:0xc001394f10 Mirostat:0xc001394ef8 NGPULayers:0xc001ddf590 MMap:0xc001394e71 MMlock:0xc001394f69 LowVRAM:0xc001394f69 Reranking:0xc001394f69 Grammar: StopWords:[<|im_end|> <|endoftext|>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc001394e50 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention: NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[gpu] Overrides:[] MCP:{Servers: Stdio:} Agent:{MaxAttempts:0 MaxIterations:0 EnableReasoning:false EnablePlanning:false EnableMCPPrompts:false EnablePlanReEvaluator:false}} 7:14PM DBG Parameters: &{modelConfigFile:/models/shuttleai_shuttle-3.5.yaml PredictionOptions:{BasicModelRequest:{Model:shuttleai_shuttle-3.5-Q4_K_M.gguf} Language: Translate:false N:0 TopP:0xc001394ed0 TopK:0xc001394ed8 Temperature:0xc001394ef0 Maxtokens:0xc001394f60 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc001394f48 TypicalP:0xc001394f40 Seed:0xc001394f90 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:shuttleai_shuttle-3.5 F16:0xc001394e70 Threads:0xc001394eb0 Debug:0xc001e710c0 Roles:map[] Embeddings:0xc001394f69 Backend:llama-cpp TemplateConfig:{Chat:{{.Input -}} <|im_start|>assistant ChatMessage:<|im_start|>{{ .RoleName }} {{ if .FunctionCall -}} {{ else if eq .RoleName "tool" -}} {{ end -}} {{ if .Content -}} {{.Content }} {{ end -}} {{ if .FunctionCall -}} {{toJson .FunctionCall}} {{ end -}}<|im_end|> Completion:{{.Input}} Edit: Functions:<|im_start|>system You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: {{range .Functions}} {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }} {{end}} For each function call return a json object with function name and arguments <|im_end|> {{.Input -}} <|im_start|>assistant UseTokenizerTemplate:false JoinChatMessagesByCharacter: Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_ANY FLAG_CHAT FLAG_COMPLETION] KnownUsecases: Pipeline:{TTS: LLM: Transcription: VAD:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc001394f18 MirostatTAU:0xc001394f10 Mirostat:0xc001394ef8 NGPULayers:0xc001ddf590 MMap:0xc001394e71 MMlock:0xc001394f69 LowVRAM:0xc001394f69 Reranking:0xc001394f69 Grammar: StopWords:[<|im_end|> <|endoftext|>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc001394e50 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention: NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[gpu] Overrides:[] MCP:{Servers: Stdio:} Agent:{MaxAttempts:0 MaxIterations:0 EnableReasoning:false EnablePlanning:false EnableMCPPrompts:false EnablePlanReEvaluator:false}} 7:14PM DBG templated message for chat: <|im_start|>system In the groves of the mountain homes lives a sageful old goat with a knowing spark in their eye. <|im_end|>

7:14PM DBG templated message for chat: <|im_start|>user test <|im_end|>

7:14PM DBG Prompt (before templating): <|im_start|>system In the groves of the mountain homes lives a sageful old goat with a knowing spark in their eye. <|im_end|>

<|im_start|>user test <|im_end|>

7:14PM DBG Template found, input modified to: <|im_start|>system In the groves of the mountain homes lives a sageful old goat with a knowing spark in their eye. <|im_end|>

7:14PM DBG Prompt (after templating): <|im_start|>system In the groves of the mountain homes lives a sageful old goat with a knowing spark in their eye. <|im_end|>

7:14PM DBG Stream request received 7:14PM INF Success ip=172.17.0.1 latency=746.245104ms method=POST status=200 url=/v1/chat/completions 7:14PM DBG Sending chunk: {"created":1762024453,"object":"chat.completion.chunk","id":"ad035c96-cbf4-42f3-be16-e8c142bbb276","model":"shuttleai_shuttle-3.5","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}} Additional context

Nov 01 '25 19:11 ResourceHog

Update: Using "SINGLE_ACTIVE_BACKEND" instead of "LOCALAI_SINGLE_ACTIVE_BACKEND" appears to resolve the issue. Suggesting the documentation here is outdated or flawed.

Nov 01 '25 20:11 ResourceHog

Update: Using "SINGLE_ACTIVE_BACKEND" instead of "LOCALAI_SINGLE_ACTIVE_BACKEND" appears to resolve the issue. Suggesting the documentation here is outdated or flawed.

SINGLE_ACTIVE_BACKEND and LOCALAI_SINGLE_ACTIVE_BACKEND are equivalent: they are toggling internally the same settings (they are both available for backward compatibility).

If restarting solved your issue, probably the problem you had was another one.

Nov 03 '25 13:11 mudler