LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

Response mangled when using tool calls with specific models like gpt-oss-20b

Open johndev168 opened this issue 3 months ago • 1 comments

LocalAI version: 3.6.0 with latest commit being #6409

Environment, CPU architecture, OS, and Version: Mac T8132 MacOS 15.5 24F74 no VM

Describe the bug The bug causes the model to answer in gibberish things or cut-content and nonsensical stuff. It happens whenever I add tools to the request. Leaving the tools out lets the model create properly responses. It also only happens on certain models, for example gpt-oss-20b (though there are more, I encountered the bug earlier but now I can clearly pinpoint it).

This is the response whenever I add tool definitions to the request:

{"created":1759993265,"object":"chat.completion","id":"4a7b5c66-10a6-4679-a1ac-e5f694ebe19d","model":"gpt-oss-20b","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"I want a … ….. ?"}}],"usage":{"prompt_tokens":9,"completion_tokens":27,"total_tokens":36}}

If it raises tool calls it will produce them wikth empty args like this:

{"created":1759993249,"object":"chat.completion","id":"1539410d-7abd-420e-b854-15865dc6af9f","model":"gpt-oss-20b","choices":[{"index":0,"finish_reason":"tool_calls","message":{"role":"assistant","content":"","tool_calls":[{"index":0,"id":"1539410d-7abd-420e-b854-15865dc6af9f","type":"function","function":{"arguments":"{}"}}]}}],"usage":{"prompt_tokens":9,"completion_tokens":13,"total_tokens":22}}

To Reproduce Load gpt-oss-20b. You can use default model settings, it doesn't matter. The backend also does not matter, I use metal-llama-cpp but changing it to llama-cpp does not change a thing.

Then send the following example:

curl -X POST "SERVERIP:8080/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{"messages": [ {"role": "user", "content": "How are you today?"}], "tools": [{"type": "function", "name": "Websearch", "description": "This function triggers an external web search to get additional data if necessary.", "parameters": { "type": "object", "properties": { "prompt": { "description": "Build a search prompt that can be used for searching the web.", "type": "string" } }, "required": [ "SearchPrompt" ], "additionalProperties": false }}], "response_format": { "type": "text"}, "model": "gpt-oss-20b"}'

It will cause the bug to appear and the reponse will be mangled.

Expected behavior There should be a proper response. When stripping the tool call data the response immediately works again.

Logs

:09AM DBG context local model name not found, setting to the first model first model name=eurollm-9b-instruct
9:09AM DBG Chat endpoint configuration read: &{PredictionOptions:{BasicModelRequest:{Model:gpt-oss-20b-mxfp4.gguf} Language: Translate:false N:0 TopP:0x140009e9af0 TopK:0x140009e9af8 Temperature:0x140009e9b00 Maxtokens:0x140009e9b08 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0x140009e9b10 TypicalP:0x140009e9b18 Seed:0x140009e9b20 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:gpt-oss-20b F16:0x140009e9b30 Threads:0x140009e9b38 Debug:0x14001090a28 Roles:map[] Embeddings:0x140009e9b41 Backend:metal-llama-cpp-development TemplateConfig:{Chat:<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: {{ now | date "Mon Jan 2 15:04:05 MST 2006" }}

Reasoning: {{if eq .ReasoningEffort ""}}medium{{else}}{{.ReasoningEffort}}{{end}}

# {{with .Metadata}}{{ if ne .system_prompt "" }}{{ .system_prompt }}{{ end }}{{else}}You are a friendly and helpful assistant.{{ end }}<|end|>{{- .Input -}}<|start|>assistant ChatMessage:<|start|>{{ if .FunctionCall -}}functions.{{ .FunctionCall.Name }} to=assistant{{ else if eq .RoleName "assistant"}}assistant<|channel|>final<|message|>{{else}}{{ .RoleName }}{{end}}<|message|>
{{- if .Content -}}
{{- .Content -}}
{{- end -}}
{{- if .FunctionCall -}}
{{- toJson .FunctionCall -}}
{{- end -}}<|end|> Completion:{{.Input}}
 Edit: Functions:<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: {{ now | date "Mon Jan 2 15:04:05 MST 2006" }}

Reasoning: {{if eq .ReasoningEffort ""}}medium{{else}}{{.ReasoningEffort}}{{end}}

# {{with .Metadata}}{{ if ne .system_prompt "" }}{{ .system_prompt }}{{ end }}{{else}}You are a friendly and helpful assistant.{{ end }}<|end|>{{- .Input -}}<|start|>assistant

# Tools

## functions

namespace functions {
{{-range .Functions}}
{{if .Description }}
// {{ .Description }}
{{- end }}
{{- if and .Parameters.Properties (gt (len .Parameters.Properties) 0) }}
type {{ .Name }} = (_: {
{{- range $name, $prop := .Parameters.Properties }}
{{- if $prop.Description }}
  // {{ $prop.Description }}
{{- end }}
  {{ $name }}: {{ if gt (len $prop.Type) 1 }}{{ range $i, $t := $prop.Type }}{{ if $i }} | {{ end }}{{ $t }}{{ end }}{{ else }}{{ index $prop.Type 0 }}{{ end }},
{{- end }}
}) => any;
{{- else }}
type {{ .Function.Name }} = () => any;
{{- end }}
{{- end }}{{/* end of range .Functions */}}
} // namespace functions

# Instructions

<|end|>{{.Input -}}<|start|>assistant UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_CHAT FLAG_ANY FLAG_COMPLETION] KnownUsecases:0x140009e9ba0 Pipeline:{TTS: LLM: Transcription: VAD:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[type:text] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0x140009e9b48 MirostatTAU:0x140009e9b68 Mirostat:0x140009e9b70 NGPULayers:0x140009e9b78 MMap:0x140009e9b88 MMlock:0x140009e9b89 LowVRAM:0x140009e9b8a Reranking:0x140009e9b8b Grammar: StopWords:[<|im_end|> <dummy32000> </s> <|endoftext|> <|return|>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0x140009e9b90 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:<nil> NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[] Overrides:[] MCP:{Servers: Stdio:} Agent:{MaxAttempts:0 MaxIterations:0 EnableReasoning:false EnableReEvaluation:false}}
9:09AM DBG Response needs to process functions
9:09AM DBG Parameters: &{PredictionOptions:{BasicModelRequest:{Model:gpt-oss-20b-mxfp4.gguf} Language: Translate:false N:0 TopP:0x140009e9af0 TopK:0x140009e9af8 Temperature:0x140009e9b00 Maxtokens:0x140009e9b08 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0x140009e9b10 TypicalP:0x140009e9b18 Seed:0x140009e9b20 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 ClipSkip:0 Tokenizer:} Name:gpt-oss-20b F16:0x140009e9b30 Threads:0x140009e9b38 Debug:0x14001090a28 Roles:map[] Embeddings:0x140009e9b41 Backend:metal-llama-cpp-development TemplateConfig:{Chat:<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: {{ now | date "Mon Jan 2 15:04:05 MST 2006" }}

Reasoning: {{if eq .ReasoningEffort ""}}medium{{else}}{{.ReasoningEffort}}{{end}}

# {{with .Metadata}}{{ if ne .system_prompt "" }}{{ .system_prompt }}{{ end }}{{else}}You are a friendly and helpful assistant.{{ end }}<|end|>{{- .Input -}}<|start|>assistant ChatMessage:<|start|>{{ if .FunctionCall -}}functions.{{ .FunctionCall.Name }} to=assistant{{ else if eq .RoleName "assistant"}}assistant<|channel|>final<|message|>{{else}}{{ .RoleName }}{{end}}<|message|>
{{- if .Content -}}
{{- .Content -}}
{{- end -}}
{{- if .FunctionCall -}}
{{- toJson .FunctionCall -}}
{{- end -}}<|end|> Completion:{{.Input}}
 Edit: Functions:<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: {{ now | date "Mon Jan 2 15:04:05 MST 2006" }}

Reasoning: {{if eq .ReasoningEffort ""}}medium{{else}}{{.ReasoningEffort}}{{end}}

# {{with .Metadata}}{{ if ne .system_prompt "" }}{{ .system_prompt }}{{ end }}{{else}}You are a friendly and helpful assistant.{{ end }}<|end|>{{- .Input -}}<|start|>assistant

# Tools

## functions

namespace functions {
{{-range .Functions}}
{{if .Description }}
// {{ .Description }}
{{- end }}
{{- if and .Parameters.Properties (gt (len .Parameters.Properties) 0) }}
type {{ .Name }} = (_: {
{{- range $name, $prop := .Parameters.Properties }}
{{- if $prop.Description }}
  // {{ $prop.Description }}
{{- end }}
  {{ $name }}: {{ if gt (len $prop.Type) 1 }}{{ range $i, $t := $prop.Type }}{{ if $i }} | {{ end }}{{ $t }}{{ end }}{{ else }}{{ index $prop.Type 0 }}{{ end }},
{{- end }}
}) => any;
{{- else }}
type {{ .Function.Name }} = () => any;
{{- end }}
{{- end }}{{/* end of range .Functions */}}
} // namespace functions

# Instructions

<|end|>{{.Input -}}<|start|>assistant UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_CHAT FLAG_ANY FLAG_COMPLETION] KnownUsecases:0x140009e9ba0 Pipeline:{TTS: LLM: Transcription: VAD:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[type:text] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0x140009e9b48 MirostatTAU:0x140009e9b68 Mirostat:0x140009e9b70 NGPULayers:0x140009e9b78 MMap:0x140009e9b88 MMlock:0x140009e9b89 LowVRAM:0x140009e9b8a Reranking:0x140009e9b8b Grammar:root ::= root-0 | root-1
freestring ::= (
                        [^\x00] |
                        "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
                  )* space
root-0-name ::= "\"\""
string ::= "\"" (
                        [^"\\] |
                        "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
                  )* "\"" space
root-1-arguments ::= "{" space "\"message\"" space ":" space string "}" space
root-1-name ::= "\"answer\""
space ::= " "?
root-0-arguments ::= "{" space "}" space
root-0 ::= "{" space "\"arguments\"" space ":" space root-0-arguments "," space "\"name\"" space ":" space root-0-name "}" space
root-1 ::= "{" space "\"arguments\"" space ":" space root-1-arguments "," space "\"name\"" space ":" space root-1-name "}" space StopWords:[<|im_end|> <dummy32000> </s> <|endoftext|> <|return|>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0x140009e9b90 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:<nil> NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[] Overrides:[] MCP:{Servers: Stdio:} Agent:{MaxAttempts:0 MaxIterations:0 EnableReasoning:false EnableReEvaluation:false}}
9:09AM DBG templated message for chat: <|start|>user<|message|>How are you today?<|end|>
9:09AM DBG Prompt (before templating): <|start|>user<|message|>How are you today?<|end|>
9:09AM DBG Template failed loading: template: prompt:14: bad number syntax: "-r"
9:09AM DBG Prompt (after templating): <|start|>user<|message|>How are you today?<|end|>
9:09AM DBG Grammar: root ::= root-0 | root-1
freestring ::= (
                        [^\x00] |
                        "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
                  )* space
root-0-name ::= "\"\""
string ::= "\"" (
                        [^"\\] |
                        "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
                  )* "\"" space
root-1-arguments ::= "{" space "\"message\"" space ":" space string "}" space
root-1-name ::= "\"answer\""
space ::= " "?
root-0-arguments ::= "{" space "}" space
root-0 ::= "{" space "\"arguments\"" space ":" space root-0-arguments "," space "\"name\"" space ":" space root-0-name "}" space
root-1 ::= "{" space "\"arguments\"" space ":" space root-1-arguments "," space "\"name\"" space ":" space root-1-name "}" space
9:09AM DBG Model already loaded in memory: gpt-oss-20b
9:09AM DBG Checking model availability (gpt-oss-20b)
9:09AM DBG Model 'gpt-oss-20b' already loaded
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout [PREDICT] Received result: {
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "stream": false,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "cache_prompt": false,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "n_predict": -1,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "top_k": 100,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "top_p": 1.0,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "typical_p": 1.0,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "temperature": 1.0,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "repeat_last_n": 0,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "repeat_penalty": 0.0,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "frequency_penalty": 0.0,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "presence_penalty": 0.0,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "mirostat": 0,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "mirostat_tau": 5.0,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "mirostat_eta": 0.10000000149011612,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "n_keep": 0,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "seed": 1330009116,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "grammar": "root ::= root-0 | root-1\nfreestring ::= (\n\t\t\t[^\\x00] |\n\t\t\t\"\\\\\" ([\"\\\\/bfnrt] | \"u\" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])\n\t\t  )* space\nroot-0-name ::= \"\\\"\\\"\"\nstring ::= \"\\\"\" (\n\t\t\t[^\"\\\\] |\n\t\t\t\"\\\\\" ([\"\\\\/bfnrt] | \"u\" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])\n\t\t  )* \"\\\"\" space\nroot-1-arguments ::= \"{\" space \"\\\"message\\\"\" space \":\" space string \"}\" space\nroot-1-name ::= \"\\\"answer\\\"\"\nspace ::= \" \"?\nroot-0-arguments ::= \"{\" space \"}\" space\nroot-0 ::= \"{\" space \"\\\"arguments\\\"\" space \":\" space root-0-arguments \",\" space \"\\\"name\\\"\" space \":\" space root-0-name \"}\" space\nroot-1 ::= \"{\" space \"\\\"arguments\\\"\" space \":\" space root-1-arguments \",\" space \"\\\"name\\\"\" space \":\" space root-1-name \"}\" space",
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "prompt": "<|start|>user<|message|>How are you today?<|end|>",
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "ignore_eos": false,
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "embeddings": "",
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "correlation_id": "",
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   "stop": [
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout     "<|im_end|>",
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout     "<dummy32000>",
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout     "</s>",
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout     "<|endoftext|>",
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout     "<|return|>"
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout   ]
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout }
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout [DEBUG] Waiting for results...
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr slot launch_slot_: id  0 | task 111 | processing task
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr slot update_slots: id  0 | task 111 | new prompt, n_ctx_slot = 16384, n_keep = 0, n_prompt_tokens = 9
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr slot update_slots: id  0 | task 111 | kv cache rm [0, end)
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr slot update_slots: id  0 | task 111 | prompt processing progress, n_past = 9, n_tokens = 9, progress = 1.000000
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr slot update_slots: id  0 | task 111 | prompt done, n_past = 9, n_tokens = 9
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr slot update_slots: id  0 | task 111 | SWA checkpoint erase, pos_min = 0, pos_max = 8, size = 0.211 MiB
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr slot update_slots: id  0 | task 111 | SWA checkpoint create, pos_min = 0, pos_max = 8, size = 0.211 MiB, total = 3/3 (0.634 MiB)
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout [DEBUG] Received 1 results
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr slot      release: id  0 | task 111 | stop processing: n_past = 31, truncated = 0
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr slot print_timing: id  0 | task 111 |
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr prompt eval time =     334.68 ms /     9 tokens (   37.19 ms per token,    26.89 tokens per second)
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr        eval time =     730.04 ms /    23 tokens (   31.74 ms per token,    31.51 tokens per second)
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr       total time =    1064.72 ms /    32 tokens
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stderr srv  update_slots: all slots are idle
9:09AM DBG GRPC(gpt-oss-20b-127.0.0.1:62524): stdout [DEBUG] Predict request completed successfully
9:09AM DBG ParseTextContent: {"arguments": {"message": "How are you today?"}, "name":"answer"}
9:09AM DBG CaptureLLMResult: []
9:09AM DBG LLM result: {"arguments": {"message": "How are you today?"}, "name":"answer"}
9:09AM DBG LLM result(processed): {"arguments": {"message": "How are you today?"}, "name":"answer"}
9:09AM DBG LLM result: {"arguments": {"message": "How are you today?"}, "name":"answer"}
9:09AM DBG LLM result(function cleanup): {"arguments": {"message": "How are you today?"}, "name":"answer"}
9:09AM DBG Function return: {"arguments": {"message": "How are you today?"}, "name":"answer"} [map[arguments:map[message:How are you today?] name:answer]]
9:09AM DBG Text content to return:
9:09AM DBG nothing to do, computing a reply
9:09AM DBG Reply received from LLM: How are you today?
9:09AM DBG Reply received from LLM(finetuned): How are you today?
9:09AM DBG Response: {"created":1759993786,"object":"chat.completion","id":"aaf0f65c-409c-42b1-a61c-ffaa30a22c62","model":"gpt-oss-20b","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"How are you today?"}}],"usage":{"prompt_tokens":9,"completion_tokens":23,"total_tokens":32}}
9:09AM INF Success ip=127.0.0.1 latency=1.659789667s method=POST status=200 url=/v1/chat/completions

Additional context The tool definition cannot be a problem by itself. It works with other models like gemma. Additionally, running it in debug mode actually produces results that are at least not gibberish but they are still nonsense whatsoever, compared to responses made when stripping tool call defintions entirely. It may be related to some models or the way the request gets parsed with those models? A reliable way to trigger the bug is using gpt-oss-20b but there are more models that share the same issue, I'll check and see if I can create a list of models that are bugged this way. If the bug is encountered, either switching the model to another one that doesn't have the bug (gemma models seem to work fine), or removing the tool definition entirely is a workaround, though with the latter you are losing tool calling.

johndev168 avatar Oct 09 '25 07:10 johndev168

I am having the same issue here. I am running GPT-OSS on AMD AI 395+. After doing some digging I was able to get the model to consistently respond with tool calls but the text is random.. I added the grammar section to the template with the following settings.

function:
  disable_no_action: true
  grammar:
#    mixed_mode: true
    disable: false
    disable_parallel_new_lines: true
    parallel_calls: true

This is an example response using the grammar settings above using and API calling tool. The two arguments are curl_input and url_input:

{
    "created": 1760495980,
    "object": "chat.completion",
    "id": "ce2e6b06-8077-42b6-bdc9-25d0d71d1435",
    "model": "localai-gpt-oss-20b",
    "choices": [
        {
            "index": 0,
            "finish_reason": "tool_calls",
            "message": {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "index": 0,
                        "id": "ce2e6b06-8077-42b6-bdc9-25d0d71d1435",
                        "type": "function",
                        "function": {
                            "name": "make_api_get_request",
                            "arguments": "{\"curl_input\":\":/a 10.1.3?..????? ...???????\",\"url_input\":\"?..?????????????..??\"}"
                        }
                    }
                ]
            }
        }
    ],
    "usage": {
        "prompt_tokens": 3012,
        "completion_tokens": 70,
        "total_tokens": 3082
    }
}

Debug Logs:

api-1  | 2:39AM DBG GRPC(localai-gpt-oss-20b-127.0.0.1:37259): stdout [DEBUG] Predict request completed successfully
api-1  | 2:39AM DBG ParseTextContent: { "arguments": { "curl_input": ":/a 10.1.3?..????? ...???????" , "url_input":"?..?????????????..??" } , "name":"make_api_get_request"}
api-1  | 2:39AM DBG CaptureLLMResult: []
api-1  | 2:39AM DBG LLM result: { "arguments": { "curl_input": ":/a 10.1.3?..????? ...???????" , "url_input":"?..?????????????..??" } , "name":"make_api_get_request"}
api-1  | 2:39AM DBG LLM result(processed): { "arguments": { "curl_input": ":/a 10.1.3?..????? ...???????" , "url_input":"?..?????????????..??" } , "name":"make_api_get_request"}
api-1  | 2:39AM DBG LLM result: { "arguments": { "curl_input": ":/a 10.1.3?..????? ...???????" , "url_input":"?..?????????????..??" } , "name":"make_api_get_request"}
api-1  | 2:39AM DBG LLM result(function cleanup): { "arguments": { "curl_input": ":/a 10.1.3?..????? ...???????" , "url_input":"?..?????????????..??" } , "name":"make_api_get_request"}
api-1  | 2:39AM DBG Function return: { "arguments": { "curl_input": ":/a 10.1.3?..????? ...???????" , "url_input":"?..?????????????..??" } , "name":"make_api_get_request"} [map[arguments:map[curl_input::/a 10.1.3?..????? ...??????? url_input:?..?????????????..??] name:make_api_get_request]]
api-1  | 2:39AM DBG Text content to return:
api-1  | 2:39AM DBG Response: {"created":1760495980,"object":"chat.completion","id":"ce2e6b06-8077-42b6-bdc9-25d0d71d1435","model":"localai-gpt-oss-20b","choices":[{"index":0,"finish_reason":"tool_calls","message":{"role":"assistant","content":"","tool_calls":[{"index":0,"id":"ce2e6b06-8077-42b6-bdc9-25d0d71d1435","type":"function","function":{"name":"make_api_get_request","arguments":"{\"curl_input\":\":/a 10.1.3?..????? ...???????\",\"url_input\":\"?..?????????????..??\"}"}}]}}],"usage":{"prompt_tokens":3012,"completion_tokens":70,"total_tokens":3082}}

Is this a grammar processing issue in the template?

jobongo avatar Oct 15 '25 01:10 jobongo