LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

template loading failed for gpt-oss-20b due to unknown variable 'ReasoningEffort'

Open billy-sung opened this issue 3 months ago • 6 comments

LocalAI version: localai/localai:latest-gpu-nvidia-cuda-12@sha256:d8d84f023cde90564631e843b9b829c4ced50a4afeb21d0816d3abc18a5d194e

Environment, CPU architecture, OS, and Version: Rocky Linux 9.5 uname -a: Linux dl002 5.14.0-503.31.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Mar 11 16:53:43 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux CPU: i9 10900x GPU: 3080ti x4

Describe the bug When attempting to run the gpt-oss-20b model, LocalAI fails to load the model due to a template loading error. The log indicates that the ReasoningEffort field, which is part of the chat template for this model, cannot be evaluated. This prevents the backend from starting correctly.

To Reproduce

  1. Start LocalAI using the Docker image: localai/localai:latest-gpu-nvidia-cuda-12.
  2. Install the gpt-oss-20b model.
  3. Send a chat completion request to the LocalAI API with any text input.
  4. The server will fail to process the request, and the error will be visible in the logs.

Expected behavior LocalAI should successfully load the template and the gpt-oss-20b model, or provide a clearer error message about how to configure the ReasoningEffort variable if it is required.

Logs

3:12AM INF BackendLoader starting backend=llama-cpp modelID=gpt-oss-20b o.model=gpt-oss-20b-mxfp4.gguf
3:12AM DBG Loading model in memory from file: /models/gpt-oss-20b-mxfp4.gguf
3:12AM DBG Loading Model gpt-oss-20b with gRPC (file: /models/gpt-oss-20b-mxfp4.gguf) (backend: llama-cpp): {backendString:llama-cpp model:gpt-oss-20b-mxfp4.gguf modelID:gpt-oss-20b context:{emptyCtx:{}} gRPCOptions:0xc0004002c8 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
3:13AM INF Success ip=127.0.0.1 latency="54.714µs" method=GET status=200 url=/readyz
3:13AM DBG context local model name not found, setting to the first model first model name=gpt-oss-20b
3:13AM DBG Chat endpoint configuration read: &{PredictionOptions:{BasicModelRequest:{Model:gpt-oss-20b-mxfp4.gguf} Language: Translate:false N:0 TopP:0xc00124a980 TopK:0xc00124a988 Temperature:0xc002911620 Maxtokens:0xc00124a9c0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc00124a9b8 TypicalP:0xc00124a9b0 Seed:0xc00124a9d0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:gpt-oss-20b F16:0xc00124a800 Threads:0xc00124a970 Debug:0xc002911990 Roles:map[] Embeddings:0xc00124a9c9 Backend:llama-cpp TemplateConfig:{Chat:<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: {{ now | date "Mon Jan 2 15:04:05 MST 2006" }}
Reasoning: {{if eq .ReasoningEffort ""}}medium{{else}}{{.ReasoningEffort}}{{end}}
# {{with .Metadata}}{{ if ne .system_prompt "" }}{{ .system_prompt }}{{ end }}{{else}}You are a friendly and helpful assistant.{{ end }}<|end|>{{- .Input -}}<|start|>assistant ChatMessage:<|start|>{{ if .FunctionCall -}}functions.{{ .FunctionCall.Name }} to=assistant{{ else if eq .RoleName "assistant"}}assistant<|channel|>final<|message|>{{else}}{{ .RoleName }}{{end}}<|message|>
{{- if .Content -}}
{{- .Content -}}
{{- end -}}
{{- if .FunctionCall -}}
{{- toJson .FunctionCall -}}
{{- end -}}<|end|> Completion:{{.Input}}
 Edit: Functions:<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: {{ now | date "Mon Jan 2 15:04:05 MST 2006" }}
Reasoning: {{if eq .ReasoningEffort ""}}medium{{else}}{{.ReasoningEffort}}{{end}}
# {{with .Metadata}}{{ if ne .system_prompt "" }}{{ .system_prompt }}{{ end }}{{else}}You are a friendly and helpful assistant.{{ end }}<|end|>{{- .Input -}}<|start|>assistant
# Tools

## functions
namespace functions {
{{-range .Functions}}
{{if .Description }}
// {{ .Description }}
{{- end }}
{{- if and .Parameters.Properties (gt (len .Parameters.Properties) 0) }}
type {{ .Name }} = (_: {
{{- range $name, $prop := .Parameters.Properties }}
{{- if $prop.Description }}
  // {{ $prop.Description }}
{{- end }}
  {{ $name }}: {{ if gt (len $prop.Type) 1 }}{{ range $i, $t := $prop.Type }}{{ if $i }} | {{ end }}{{ $t }}{{ end }}{{ else }}{{ index $prop.Type 0 }}{{ end }},
{{- end }}
}) => any;
{{- else }}
type {{ .Function.Name }} = () => any;
{{- end }}
{{- end }}{{/* end of range .Functions */}}
} // namespace functions
# Instructions
<|end|>{{.Input -}}<|start|>assistant UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_ANY FLAG_CHAT FLAG_COMPLETION] KnownUsecases:<nil> Pipeline:{TTS: LLM: Transcription: VAD:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc00124a9a8 MirostatTAU:0xc00124a9a0 Mirostat:0xc00124a998 NGPULayers:<nil> MMap:0xc00124a801 MMlock:0xc00124a9c9 LowVRAM:0xc00124a9c9 Reranking:0xc00124a9c9 Grammar: StopWords:[<|im_end|> <dummy32000> </s> <|endoftext|> <|return|>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc00124a7f0 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[] Overrides:[]}

3:13AM DBG Parameters: &{PredictionOptions:{BasicModelRequest:{Model:gpt-oss-20b-mxfp4.gguf} Language: Translate:false N:0 TopP:0xc00124a980 TopK:0xc00124a988 Temperature:0xc002911620 Maxtokens:0xc00124a9c0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc00124a9b8 TypicalP:0xc00124a9b0 Seed:0xc00124a9d0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:gpt-oss-20b F16:0xc00124a800 Threads:0xc00124a970 Debug:0xc002911990 Roles:map[] Embeddings:0xc00124a9c9 Backend:llama-cpp TemplateConfig:{Chat:<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: {{ now | date "Mon Jan 2 15:04:05 MST 2006" }}
Reasoning: {{if eq .ReasoningEffort ""}}medium{{else}}{{.ReasoningEffort}}{{end}}
# {{with .Metadata}}{{ if ne .system_prompt "" }}{{ .system_prompt }}{{ end }}{{else}}You are a friendly and helpful assistant.{{ end }}<|end|>{{- .Input -}}<|start|>assistant ChatMessage:<|start|>{{ if .FunctionCall -}}functions.{{ .FunctionCall.Name }} to=assistant{{ else if eq .RoleName "assistant"}}assistant<|channel|>final<|message|>{{else}}{{ .RoleName }}{{end}}<|message|>
{{- if .Content -}}
{{- .Content -}}
{{- end -}}
{{- if .FunctionCall -}}
{{- toJson .FunctionCall -}}
{{- end -}}<|end|> Completion:{{.Input}}
 Edit: Functions:<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: {{ now | date "Mon Jan 2 15:04:05 MST 2006" }}
Reasoning: {{if eq .ReasoningEffort ""}}medium{{else}}{{.ReasoningEffort}}{{end}}
# {{with .Metadata}}{{ if ne .system_prompt "" }}{{ .system_prompt }}{{ end }}{{else}}You are a friendly and helpful assistant.{{ end }}<|end|>{{- .Input -}}<|start|>assistant
# Tools

## functions
namespace functions {
{{-range .Functions}}
{{if .Description }}
// {{ .Description }}
{{- end }}
{{- if and .Parameters.Properties (gt (len .Parameters.Properties) 0) }}
type {{ .Name }} = (_: {
{{- range $name, $prop := .Parameters.Properties }}
{{- if $prop.Description }}
  // {{ $prop.Description }}
{{- end }}
  {{ $name }}: {{ if gt (len $prop.Type) 1 }}{{ range $i, $t := $prop.Type }}{{ if $i }} | {{ end }}{{ $t }}{{ end }}{{ else }}{{ index $prop.Type 0 }}{{ end }},
{{- end }}
}) => any;
{{- else }}
type {{ .Function.Name }} = () => any;
{{- end }}
{{- end }}{{/* end of range .Functions */}}
} // namespace functions
# Instructions
<|end|>{{.Input -}}<|start|>assistant UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_ANY FLAG_CHAT FLAG_COMPLETION] KnownUsecases:<nil> Pipeline:{TTS: LLM: Transcription: VAD:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc00124a9a8 MirostatTAU:0xc00124a9a0 Mirostat:0xc00124a998 NGPULayers:<nil> MMap:0xc00124a801 MMlock:0xc00124a9c9 LowVRAM:0xc00124a9c9 Reranking:0xc00124a9c9 Grammar: StopWords:[<|im_end|> <dummy32000> </s> <|endoftext|> <|return|>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc00124a7f0 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[] Overrides:[]}

3:13AM DBG templated message for chat: <|start|>system<|message|>Given the following conversation, relevant context, and a follow up question, reply with an answer to the current question the user is asking. Return only your response to the question given the above information following the users instructions as needed.<|end|>
3:13AM DBG templated message for chat: <|start|>user<|message|>introduce<|end|>
3:13AM DBG Prompt (before templating): <|start|>system<|message|>Given the following conversation, relevant context, and a follow up question, reply with an answer to the current question the user is asking. Return only your response to the question given the above information following the users instructions as needed.<|end|>
<|start|>user<|message|>introduce<|end|>
3:13AM DBG Template failed loading: template: prompt:5:19: executing "prompt" at <.ReasoningEffort>: can't evaluate field ReasoningEffort in type templates.PromptTemplateData
3:13AM DBG Prompt (after templating): <|start|>system<|message|>Given the following conversation, relevant context, and a follow up question, reply with an answer to the current question the user is asking. Return only your response to the question given the above information following the users instructions as needed.<|end|>
<|start|>user<|message|>introduce<|end|>
3:13AM DBG Stream request received
3:13AM INF Success ip=172.20.0.2 latency=958.328683ms method=POST status=200 url=/v1/chat/completions
3:13AM INF BackendLoader starting backend=llama-cpp modelID=gpt-oss-20b o.model=gpt-oss-20b-mxfp4.gguf
3:13AM DBG Loading model in memory from file: /models/gpt-oss-20b-mxfp4.gguf
3:13AM DBG Loading Model gpt-oss-20b with gRPC (file: /models/gpt-oss-20b-mxfp4.gguf) (backend: llama-cpp): {backendString:llama-cpp model:gpt-oss-20b-mxfp4.gguf modelID:gpt-oss-20b context:{emptyCtx:{}} gRPCOptions:0xc0000d4b08 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
3:13AM DBG Sending chunk: {"created":1755141207,"object":"chat.completion.chunk","id":"d6472935-ca20-4d05-b23f-227ab1ff9522","model":"gpt-oss-20b","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

Additional context The issue appears to be related to a variable within the chat template that is not being passed to the templating engine. The template for gpt-oss-20b contains the line: Reasoning: {{if eq .ReasoningEffort ""}}medium{{else}}{{.ReasoningEffort}}{{end}}. The error occurs because the .ReasoningEffort field is missing.

billy-sung avatar Aug 14 '25 03:08 billy-sung

@billy-sung here gpt-oss seems to work fine. What version of LocalAI are you running? it's visible in the footer of the webUI.

mudler avatar Aug 14 '25 07:08 mudler

@billy-sung here gpt-oss seems to work fine. What version of LocalAI are you running? it's visible in the footer of the webUI.

LocalAI Version v3.3.2 (d6274eaf4ab0bf10fb130ec5e762c73ae6ea3feb)

billy-sung avatar Aug 14 '25 07:08 billy-sung

@billy-sung is that a new install? Did you tried to pull again the image? you need the latest LocalAI due to the new template format (we are at 3.4.0)

mudler avatar Aug 14 '25 07:08 mudler

I've recreated the container(3.4.0). It appears that llama.cpp is not being launched at all.

Here is the docker-compose configuration I used:

services:
  localai:
    image: localai/localai:latest-gpu-nvidia-cuda-12
    container_name: local-ai
    runtime: nvidia
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 18080:8080
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
      - DEBUG=true
    volumes:
      - /srv/localai:/models:cached

Even with this setup, the models are not working and there are no signs of llama.cpp in the logs.

logs:

CPU info:
model name	: Intel(R) Core(TM) i9-10900X CPU @ 3.70GHz
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req vnmi avx512_vnni md_clear flush_l1d arch_capabilities
CPU:    AVX    found OK
CPU:    AVX2   found OK
CPU:    AVX512 found OK
8:00AM DBG Setting logging to debug
8:00AM INF Starting LocalAI using 10 threads, with models path: //models
8:00AM INF LocalAI version: v3.4.0 (b2e8b6d1aa652b6a95828fe91271e5b686fffa7f)
8:00AM DBG CPU capabilities: [3dnowprefetch abm acpi adx aes aperfmperf apic arat arch_capabilities arch_perfmon art avx avx2 avx512_vnni avx512bw avx512cd avx512dq avx512f avx512vl bmi1 bmi2 bts cat_l3 cdp_l3 clflush clflushopt clwb cmov constant_tsc cpuid cpuid_fault cqm cqm_llc cqm_mbm_local cqm_mbm_total cqm_occup_llc cx16 cx8 dca de ds_cpl dtes64 dtherm dts epb ept ept_ad erms est f16c flexpriority flush_l1d fma fpu fsgsbase fxsr ht hwp hwp_act_window hwp_epp hwp_pkg_req ibpb ibrs ibrs_enhanced intel_pt invpcid lahf_lm lm mba mca mce md_clear mmx monitor movbe mpx msr mtrr nonstop_tsc nopl nx pae pat pbe pcid pclmulqdq pdcm pdpe1gb pebs pge pln pni popcnt pse pse36 pts rdrand rdseed rdt_a rdtscp rep_good sdbg sep smap smep ss ssbd sse sse2 sse4_1 sse4_2 ssse3 stibp syscall tm tm2 tpr_shadow tsc tsc_adjust tsc_deadline_timer vme vmx vnmi vpid x2apic xgetbv1 xsave xsavec xsaveopt xsaves xtopology xtpr]
8:00AM DBG GPU count: 4
8:00AM DBG GPU: card #0  [affined to NUMA node 0]@0000:19:00.0 -> driver: 'nvidia' class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'GA102 [GeForce RTX 3080 Ti]'
8:00AM DBG GPU: card #1  [affined to NUMA node 0]@0000:1a:00.0 -> driver: 'nvidia' class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'GA102 [GeForce RTX 3080 Ti]'
8:00AM DBG GPU: card #2  [affined to NUMA node 0]@0000:67:00.0 -> driver: 'nvidia' class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'GA102 [GeForce RTX 3080 Ti]'
8:00AM DBG GPU: card #3  [affined to NUMA node 0]@0000:68:00.0 -> driver: 'nvidia' class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'GA102 [GeForce RTX 3080 Ti]'
8:00AM WRN guessDefaultsFromFile: full offload is recommended
8:00AM DBG guessDefaultsFromFile: 63 layers estimated
8:00AM DBG guessDefaultsFromFile: NGPULayers set NGPULayers=63
8:00AM DBG guessDefaultsFromFile: template already set name=gemma-3-27b-it
8:00AM WRN guessDefaultsFromFile: full offload is recommended
8:00AM DBG guessDefaultsFromFile: 37 layers estimated
8:00AM DBG guessDefaultsFromFile: NGPULayers set NGPULayers=37
8:00AM DBG guessDefaultsFromFile: template already set name=qwen3-embedding-8b
8:00AM INF Preloading models from //models
  Model name: gemma-3-27b-it                                                  
  Model name: gpt-oss-20b                                                     
  Model name: qwen3-embedding-8b                                              
8:00AM DBG Model: gemma-3-27b-it (config: {PredictionOptions:{BasicModelRequest:{Model:gemma-3-27b-it-Q4_K_M.gguf} Language: Translate:false N:0 TopP:0xc000cac008 TopK:0xc000cac010 Temperature:0xc000cac018 Maxtokens:0xc000cac048 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000cac040 TypicalP:0xc000cac038 Seed:0xc000cac058 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:gemma-3-27b-it F16:0xc000cac000 Threads:0xc000c15fe8 Debug:0xc000cac050 Roles:map[] Embeddings:0xc000cac051 Backend:llama-cpp TemplateConfig:{Chat:{{.Input }}
<start_of_turn>model
 ChatMessage:<start_of_turn>{{if eq .RoleName "assistant" }}model{{else}}{{ .RoleName }}{{end}}
{{ if .FunctionCall -}}
{{ else if eq .RoleName "tool" -}}
{{ end -}}
{{ if .Content -}}
{{.Content -}}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<end_of_turn> Completion:{{.Input}}
 Edit: Functions:<start_of_turn>system
You have access to functions. If you decide to invoke any of the function(s),
you MUST put it in the format of
{"name": function name, "parameters": dictionary of argument name and its value}
You SHOULD NOT include any other text in the response if you call a function
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
<end_of_turn>
      
{{.Input -}}
<start_of_turn>model
 UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_COMPLETION FLAG_ANY FLAG_CHAT] KnownUsecases:<nil> Pipeline:{TTS: LLM: Transcription: VAD:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000cac030 MirostatTAU:0xc000cac028 Mirostat:0xc000cac020 NGPULayers:0xc0009fe6d8 MMap:0xc000c15fa8 MMlock:0xc000cac051 LowVRAM:0xc000cac051 Reranking:0xc000cac051 Grammar: StopWords:[<|im_end|> <end_of_turn> <start_of_turn>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000c15c28 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[gpu] Overrides:[]})
8:00AM DBG Model: gpt-oss-20b (config: {PredictionOptions:{BasicModelRequest:{Model:gpt-oss-20b-mxfp4.gguf} Language: Translate:false N:0 TopP:0xc001a3a5a0 TopK:0xc001a3a5a8 Temperature:0xc001a3a5b0 Maxtokens:0xc001a3a5e0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc001a3a5d8 TypicalP:0xc001a3a5d0 Seed:0xc001a3a5f0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:gpt-oss-20b F16:0xc001a3a578 Threads:0xc001a3a590 Debug:0xc001a3a5e8 Roles:map[] Embeddings:0xc001a3a5e9 Backend:llama-cpp TemplateConfig:{Chat:<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: {{ now | date "Mon Jan 2 15:04:05 MST 2006" }}
Reasoning: {{if eq .ReasoningEffort ""}}medium{{else}}{{.ReasoningEffort}}{{end}}
# {{with .Metadata}}{{ if ne .system_prompt "" }}{{ .system_prompt }}{{ end }}{{else}}You are a friendly and helpful assistant.{{ end }}<|end|>{{- .Input -}}<|start|>assistant ChatMessage:<|start|>{{ if .FunctionCall -}}functions.{{ .FunctionCall.Name }} to=assistant{{ else if eq .RoleName "assistant"}}assistant<|channel|>final<|message|>{{else}}{{ .RoleName }}{{end}}<|message|>
{{- if .Content -}}
{{- .Content -}}
{{- end -}}
{{- if .FunctionCall -}}
{{- toJson .FunctionCall -}}
{{- end -}}<|end|> Completion:{{.Input}}
 Edit: Functions:<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: {{ now | date "Mon Jan 2 15:04:05 MST 2006" }}
Reasoning: {{if eq .ReasoningEffort ""}}medium{{else}}{{.ReasoningEffort}}{{end}}
# {{with .Metadata}}{{ if ne .system_prompt "" }}{{ .system_prompt }}{{ end }}{{else}}You are a friendly and helpful assistant.{{ end }}<|end|>{{- .Input -}}<|start|>assistant
# Tools
      
## functions
namespace functions {
{{-range .Functions}}
{{if .Description }}
// {{ .Description }}
{{- end }}
{{- if and .Parameters.Properties (gt (len .Parameters.Properties) 0) }}
type {{ .Name }} = (_: {
{{- range $name, $prop := .Parameters.Properties }}
{{- if $prop.Description }}
  // {{ $prop.Description }}
{{- end }}
  {{ $name }}: {{ if gt (len $prop.Type) 1 }}{{ range $i, $t := $prop.Type }}{{ if $i }} | {{ end }}{{ $t }}{{ end }}{{ else }}{{ index $prop.Type 0 }}{{ end }},
{{- end }}
}) => any;
{{- else }}
type {{ .Function.Name }} = () => any;
{{- end }}
{{- end }}{{/* end of range .Functions */}}
} // namespace functions
# Instructions
<|end|>{{.Input -}}<|start|>assistant UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_COMPLETION FLAG_ANY FLAG_CHAT] KnownUsecases:<nil> Pipeline:{TTS: LLM: Transcription: VAD:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc001a3a5c8 MirostatTAU:0xc001a3a5c0 Mirostat:0xc001a3a5b8 NGPULayers:<nil> MMap:0xc001a3a579 MMlock:0xc001a3a5e9 LowVRAM:0xc001a3a5e9 Reranking:0xc001a3a5e9 Grammar: StopWords:[<|im_end|> <dummy32000> </s> <|endoftext|> <|return|>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc001a3a568 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[] Overrides:[]})
8:00AM DBG Model: qwen3-embedding-8b (config: {PredictionOptions:{BasicModelRequest:{Model:Qwen3-Embedding-8B-Q4_K_M.gguf} Language: Translate:false N:0 TopP:0xc0018cd950 TopK:0xc0018cd958 Temperature:0xc0018cd960 Maxtokens:0xc0018cd990 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc0018cd988 TypicalP:0xc0018cd980 Seed:0xc0018cd9a0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:qwen3-embedding-8b F16:0xc0018cd931 Threads:0xc0018cd940 Debug:0xc0018cd998 Roles:map[] Embeddings:0xc0018cd930 Backend:llama-cpp TemplateConfig:{Chat:{{.Input -}}
<|im_start|>assistant
 ChatMessage:<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
{{ else if eq .RoleName "tool" -}}
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>
 Completion:{{.Input}}
 Edit: Functions:<|im_start|>system
You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
For each function call return a json object with function name and arguments
<|im_end|>
      
{{.Input -}}
<|im_start|>assistant
 UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_CHAT FLAG_ANY FLAG_COMPLETION FLAG_EMBEDDINGS] KnownUsecases:<nil> Pipeline:{TTS: LLM: Transcription: VAD:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc0018cd978 MirostatTAU:0xc0018cd970 Mirostat:0xc0018cd968 NGPULayers:0xc0009fe798 MMap:0xc0018cd932 MMlock:0xc0018cd999 LowVRAM:0xc0018cd999 Reranking:0xc0018cd999 Grammar: StopWords:[<|im_end|> <dummy32000> </s> <|endoftext|>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc0018cd920 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[gpu] Overrides:[]})
8:00AM DBG processing api keys runtime update
8:00AM DBG processing external_backends.json
8:00AM DBG external backends loaded from external_backends.json
8:00AM INF core/startup process completed!
8:00AM DBG GPU vendor gpuVendor=nvidia
8:00AM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
8:00AM INF Success ip=192.168.1.107 latency=10.84391ms method=GET status=200 url=/chat/
8:00AM INF Success ip=192.168.1.107 latency="29.089µs" method=GET status=200 url=/static/assets/highlightjs.css
8:00AM INF Success ip=192.168.1.107 latency="83.858µs" method=GET status=200 url=/static/assets/font1.css
8:00AM INF Success ip=192.168.1.107 latency="34.57µs" method=GET status=200 url=/static/assets/font2.css
8:00AM INF Success ip=192.168.1.107 latency="36.389µs" method=GET status=200 url=/static/assets/fontawesome/css/brands.css
8:00AM INF Success ip=192.168.1.107 latency="68.129µs" method=GET status=200 url=/static/general.css
8:00AM INF Success ip=192.168.1.107 latency="161.411µs" method=GET status=200 url=/static/assets/fontawesome/css/fontawesome.css
8:00AM INF Success ip=192.168.1.107 latency="82.34µs" method=GET status=200 url=/static/assets/tw-elements.css
8:00AM INF Success ip=192.168.1.107 latency="31.653µs" method=GET status=200 url=/static/assets/tailwindcss.js
8:00AM INF Success ip=192.168.1.107 latency="23.761µs" method=GET status=200 url=/static/assets/fontawesome/css/solid.css
8:00AM INF Success ip=192.168.1.107 latency="63.048µs" method=GET status=200 url=/static/assets/flowbite.min.js
8:00AM INF Success ip=192.168.1.107 latency="23.776µs" method=GET status=200 url=/static/assets/htmx.js
8:00AM INF Success ip=192.168.1.107 latency="25.144µs" method=GET status=200 url=/static/assets/highlightjs.js
8:00AM INF Success ip=192.168.1.107 latency="24.419µs" method=GET status=200 url=/static/logo_horizontal.png
8:00AM INF Success ip=192.168.1.107 latency="34.531µs" method=GET status=200 url=/static/assets/alpine.js
8:00AM INF Success ip=192.168.1.107 latency="14.897µs" method=GET status=200 url=/static/assets/purify.js
8:00AM INF Success ip=192.168.1.107 latency="23.502µs" method=GET status=200 url=/static/assets/marked.js
8:00AM INF Success ip=192.168.1.107 latency="24.686µs" method=GET status=200 url=/static/chat.js
8:00AM INF Success ip=192.168.1.107 latency="24.451µs" method=GET status=200 url=/static/assets/fontawesome/webfonts/fa-solid-900.woff2
8:00AM INF Success ip=192.168.1.107 latency=4.026278ms method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuLyfMZg.ttf
8:00AM INF Success ip=192.168.1.107 latency="27.453µs" method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuGKYMZg.ttf
8:00AM INF Success ip=192.168.1.107 latency="37.512µs" method=GET status=200 url=/static/favicon.svg
8:00AM INF Success ip=192.168.1.107 latency=2.086203ms method=GET status=200 url=/
8:00AM INF Success ip=192.168.1.107 latency="30.383µs" method=GET status=200 url=/static/assets/highlightjs.css
8:00AM INF Success ip=192.168.1.107 latency="25.693µs" method=GET status=200 url=/static/general.css
8:00AM INF Success ip=192.168.1.107 latency="38.004µs" method=GET status=200 url=/static/assets/highlightjs.js
8:00AM INF Success ip=192.168.1.107 latency="23.725µs" method=GET status=200 url=/static/assets/font2.css
8:00AM INF Success ip=192.168.1.107 latency="57.53µs" method=GET status=200 url=/static/assets/tw-elements.css
8:00AM INF Success ip=192.168.1.107 latency="25.178µs" method=GET status=200 url=/static/assets/font1.css
8:00AM INF Success ip=192.168.1.107 latency="25.28µs" method=GET status=200 url=/static/assets/fontawesome/css/fontawesome.css
8:00AM INF Success ip=192.168.1.107 latency="45.303µs" method=GET status=200 url=/static/assets/fontawesome/css/brands.css
8:00AM INF Success ip=192.168.1.107 latency="23.583µs" method=GET status=200 url=/static/assets/fontawesome/css/solid.css
8:00AM INF Success ip=192.168.1.107 latency="24.721µs" method=GET status=200 url=/static/assets/tailwindcss.js
8:00AM INF Success ip=192.168.1.107 latency="23.227µs" method=GET status=200 url=/static/assets/flowbite.min.js
8:00AM INF Success ip=192.168.1.107 latency="23.237µs" method=GET status=200 url=/static/assets/htmx.js
8:00AM INF Success ip=192.168.1.107 latency="22.768µs" method=GET status=200 url=/static/logo_horizontal.png
8:00AM INF Success ip=192.168.1.107 latency="25.833µs" method=GET status=200 url=/static/assets/tw-elements.js
8:00AM INF Success ip=192.168.1.107 latency="17.212µs" method=GET status=200 url=/static/assets/purify.js
8:00AM INF Success ip=192.168.1.107 latency="24.525µs" method=GET status=200 url=/static/assets/alpine.js
8:00AM INF Success ip=192.168.1.107 latency="38.243µs" method=GET status=200 url=/static/assets/marked.js
8:00AM INF Success ip=192.168.1.107 latency="26.478µs" method=GET status=200 url=/static/assets/fontawesome/webfonts/fa-brands-400.woff2
8:00AM INF Success ip=192.168.1.107 latency="29.49µs" method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuLyfMZg.ttf
8:00AM INF Success ip=192.168.1.107 latency="24.516µs" method=GET status=200 url=/static/assets/fontawesome/webfonts/fa-solid-900.woff2
8:00AM INF Success ip=192.168.1.107 latency="27.497µs" method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuFuYMZg.ttf
8:00AM INF Success ip=192.168.1.107 latency=1.351852ms method=GET status=200 url=/chat/
8:00AM INF Success ip=192.168.1.107 latency="34.427µs" method=GET status=200 url=/static/assets/highlightjs.css
8:00AM INF Success ip=192.168.1.107 latency="25.866µs" method=GET status=200 url=/static/assets/font2.css
8:00AM INF Success ip=192.168.1.107 latency="42.492µs" method=GET status=200 url=/static/general.css
8:00AM INF Success ip=192.168.1.107 latency="123.754µs" method=GET status=200 url=/static/assets/highlightjs.js
8:00AM INF Success ip=192.168.1.107 latency="11.596µs" method=GET status=200 url=/static/assets/tw-elements.css
8:00AM INF Success ip=192.168.1.107 latency="234.039µs" method=GET status=200 url=/static/assets/font1.css
8:00AM INF Success ip=192.168.1.107 latency="24.511µs" method=GET status=200 url=/static/assets/fontawesome/css/fontawesome.css
8:00AM INF Success ip=192.168.1.107 latency="62.371µs" method=GET status=200 url=/static/assets/fontawesome/css/brands.css
8:00AM INF Success ip=192.168.1.107 latency="11.02µs" method=GET status=200 url=/static/assets/fontawesome/css/solid.css
8:00AM INF Success ip=192.168.1.107 latency="25.445µs" method=GET status=200 url=/static/assets/tailwindcss.js
8:00AM INF Success ip=192.168.1.107 latency="32.904µs" method=GET status=200 url=/static/assets/flowbite.min.js
8:00AM INF Success ip=192.168.1.107 latency="23.898µs" method=GET status=200 url=/static/assets/htmx.js
8:00AM INF Success ip=192.168.1.107 latency="21.13µs" method=GET status=200 url=/static/logo_horizontal.png
8:00AM INF Success ip=192.168.1.107 latency="33.854µs" method=GET status=200 url=/static/assets/alpine.js
8:00AM INF Success ip=192.168.1.107 latency="38.061µs" method=GET status=200 url=/static/assets/purify.js
8:00AM INF Success ip=192.168.1.107 latency="119.877µs" method=GET status=200 url=/static/chat.js
8:00AM INF Success ip=192.168.1.107 latency="194.994µs" method=GET status=200 url=/static/assets/marked.js
8:00AM INF Success ip=192.168.1.107 latency="38.487µs" method=GET status=200 url=/static/assets/fontawesome/webfonts/fa-solid-900.woff2
8:00AM INF Success ip=192.168.1.107 latency="25.162µs" method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuLyfMZg.ttf
8:00AM INF Success ip=192.168.1.107 latency="16.392µs" method=GET status=200 url=/static/assets/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuGKYMZg.ttf
8:00AM DBG context local model name not found, setting to the first model first model name=gemma-3-27b-it
8:01AM WRN guessDefaultsFromFile: full offload is recommended
8:01AM DBG guessDefaultsFromFile: NGPULayers set NGPULayers=63
8:01AM DBG guessDefaultsFromFile: template already set name=gemma-3-27b-it
8:01AM DBG Chat endpoint configuration read: &{PredictionOptions:{BasicModelRequest:{Model:gemma-3-27b-it-Q4_K_M.gguf} Language: Translate:false N:0 TopP:0xc000cac008 TopK:0xc000cac010 Temperature:0xc000cac018 Maxtokens:0xc000cac048 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000cac040 TypicalP:0xc000cac038 Seed:0xc000cac058 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:gemma-3-27b-it F16:0xc000cac000 Threads:0xc000c15fe8 Debug:0xc002eb99d8 Roles:map[] Embeddings:0xc000cac051 Backend:llama-cpp TemplateConfig:{Chat:{{.Input }}
<start_of_turn>model
 ChatMessage:<start_of_turn>{{if eq .RoleName "assistant" }}model{{else}}{{ .RoleName }}{{end}}
{{ if .FunctionCall -}}
{{ else if eq .RoleName "tool" -}}
{{ end -}}
{{ if .Content -}}
{{.Content -}}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<end_of_turn> Completion:{{.Input}}
 Edit: Functions:<start_of_turn>system
You have access to functions. If you decide to invoke any of the function(s),
you MUST put it in the format of
{"name": function name, "parameters": dictionary of argument name and its value}
You SHOULD NOT include any other text in the response if you call a function
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
<end_of_turn>
      
{{.Input -}}
<start_of_turn>model
 UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_COMPLETION FLAG_ANY FLAG_CHAT] KnownUsecases:<nil> Pipeline:{TTS: LLM: Transcription: VAD:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000cac030 MirostatTAU:0xc000cac028 Mirostat:0xc000cac020 NGPULayers:0xc0009fe6d8 MMap:0xc000c15fa8 MMlock:0xc000cac051 LowVRAM:0xc000cac051 Reranking:0xc000cac051 Grammar: StopWords:[<|im_end|> <end_of_turn> <start_of_turn>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000c15c28 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[gpu] Overrides:[]}
8:01AM DBG Parameters: &{PredictionOptions:{BasicModelRequest:{Model:gemma-3-27b-it-Q4_K_M.gguf} Language: Translate:false N:0 TopP:0xc000cac008 TopK:0xc000cac010 Temperature:0xc000cac018 Maxtokens:0xc000cac048 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000cac040 TypicalP:0xc000cac038 Seed:0xc000cac058 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:gemma-3-27b-it F16:0xc000cac000 Threads:0xc000c15fe8 Debug:0xc002eb99d8 Roles:map[] Embeddings:0xc000cac051 Backend:llama-cpp TemplateConfig:{Chat:{{.Input }}
<start_of_turn>model
 ChatMessage:<start_of_turn>{{if eq .RoleName "assistant" }}model{{else}}{{ .RoleName }}{{end}}
{{ if .FunctionCall -}}
{{ else if eq .RoleName "tool" -}}
{{ end -}}
{{ if .Content -}}
{{.Content -}}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<end_of_turn> Completion:{{.Input}}
 Edit: Functions:<start_of_turn>system
You have access to functions. If you decide to invoke any of the function(s),
you MUST put it in the format of
{"name": function name, "parameters": dictionary of argument name and its value}
You SHOULD NOT include any other text in the response if you call a function
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
<end_of_turn>
      
{{.Input -}}
<start_of_turn>model
 UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_COMPLETION FLAG_ANY FLAG_CHAT] KnownUsecases:<nil> Pipeline:{TTS: LLM: Transcription: VAD:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000cac030 MirostatTAU:0xc000cac028 Mirostat:0xc000cac020 NGPULayers:0xc0009fe6d8 MMap:0xc000c15fa8 MMlock:0xc000cac051 LowVRAM:0xc000cac051 Reranking:0xc000cac051 Grammar: StopWords:[<|im_end|> <end_of_turn> <start_of_turn>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000c15c28 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[gpu] Overrides:[]}
8:01AM DBG templated message for chat: <start_of_turn>user
hi<end_of_turn>
8:01AM DBG Prompt (before templating): <start_of_turn>user
hi<end_of_turn>
8:01AM DBG Template found, input modified to: <start_of_turn>user
hi<end_of_turn>
<start_of_turn>model
8:01AM DBG Prompt (after templating): <start_of_turn>user
hi<end_of_turn>
<start_of_turn>model
8:01AM DBG Stream request received
8:01AM INF Success ip=192.168.1.107 latency=626.3677ms method=POST status=200 url=/v1/chat/completions
8:01AM DBG Sending chunk: {"created":1755158460,"object":"chat.completion.chunk","id":"d79f7532-f066-483e-94f9-ff966acbce48","model":"gemma-3-27b-it","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
8:01AM INF BackendLoader starting backend=llama-cpp modelID=gemma-3-27b-it o.model=gemma-3-27b-it-Q4_K_M.gguf
8:01AM DBG Loading model in memory from file: /models/gemma-3-27b-it-Q4_K_M.gguf
8:01AM DBG Loading Model gemma-3-27b-it with gRPC (file: /models/gemma-3-27b-it-Q4_K_M.gguf) (backend: llama-cpp): {backendString:llama-cpp model:gemma-3-27b-it-Q4_K_M.gguf modelID:gemma-3-27b-it context:{emptyCtx:{}} gRPCOptions:0xc0007918c8 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false}
8:01AM INF Success ip=127.0.0.1 latency="77.063µs" method=GET status=200 url=/readyz
8:02AM INF Success ip=127.0.0.1 latency="44.559µs" method=GET status=200 url=/readyz

billy-sung avatar Aug 14 '25 08:08 billy-sung

quick read of your logs shows the renderer fails on variable ReasoningEffort before the backend even spins up, so this is a template and binary version skew, not the model itself. on LocalAI v3.1.2 that field is not in the request context yet, which is why the template compile stops and you never see llama.cpp start.

what usually fixes it

  1. upgrade the image to a build that includes ReasoningEffort in the context. pull a newer tag and restart clean so old templates are not shadowing the update.
  2. or hot-patch the template as a guard. wrap the lines that use it, for example:
{{ if .ReasoningEffort }}
  {{/* use the value here */}}
{{ end }}

or remove the placeholder entirely for a quick unblock. 3) confirm by checking the first successful init line from the backend in logs. if you still do not see a load line after the template compiles, it is a different startup issue.

if you want a short checklist that maps this to our failure catalog, this is Problem Map No.16 version skew first call crash. happy to share the doc if you want it.

onestardao avatar Aug 24 '25 07:08 onestardao

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Nov 23 '25 02:11 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Nov 29 '25 02:11 github-actions[bot]