LocalAI On GPU, GPT-4o rambles on and on and on... (no stopwords)

LocalAI version: LocalAI version: v2.25.0 (07655c0c2e0e5fe2bca86339a12237b69d258636)

Environment, CPU architecture, OS, and Version: Linux ai-server 5.10.102.1-dxgrknl #1 SMP Sat Apr 23 13:33:19 +07 2022 x86_64 x86_64 x86_64 GNU/Linux It's a VM with 2x vCPU, GPU-np partitioning on an RTX 3090. (Somehow managed to get that working...)

Describe the bug Ask the GPT-4o model with the default config anything, it rambles on and on and on....

Not sure if this happens on CPU, I am using the GPU Nvidia.

To Reproduce Open /chat/gpt-4o, say Hello?.

Expected behavior Response stops after first USER:

Logs

Not needed.

Additional context

I added 2 sections (from the GPT-4 config) to get it working properly:

stopwords:
  - "<|im_end|>"
  - "<|eot_id|>"
  - "</tool_call>"
  - "<|end_of_text|>"
  - "<dummy32000>"
  - "<|im_start|>"
  - "\nUSER:"
  - "\nASSISTANT:"

(not sure which were necessary)

Also, and not sure if the functioncall stuff should be there:

template:
  chat: |
    {{.Input -}}
    <|im_start|>assistant
  chat_message: |
    <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
    {{- if .FunctionCall }}
    <tool_call>
    {{- else if eq .RoleName "tool" }}
    <tool_response>
    {{- end }}
    {{- if .Content}}
    {{.Content }}
    {{- end }}
    {{- if .FunctionCall}}
    {{toJson .FunctionCall}}
    {{- end }}
    {{- if .FunctionCall }}
    </tool_call>
    {{- else if eq .RoleName "tool" }}
    </tool_response>
    {{- end }}<|im_end|>
  completion: |
    {{.Input}}

I am happy to open a PR if you agree. Or, if this works on CPU, understand why GPU image doesn't.

Feb 12 '25 22:02 nick-pape

LocalAI version: LocalAI version: v2.25.0 (07655c0c2e0e5fe2bca86339a12237b69d258636)

Environment, CPU architecture, OS, and Version: Linux ai-server 5.10.102.1-dxgrknl #1 SMP Sat Apr 23 13:33:19 +07 2022 x86_64 x86_64 x86_64 GNU/Linux It's a VM with 2x vCPU, GPU-np partitioning on an RTX 3090. (Somehow managed to get that working...)

Describe the bug Ask the GPT-4o model with the default config anything, it rambles on and on and on....

Not sure if this happens on CPU, I am using the GPU Nvidia.

To Reproduce Open /chat/gpt-4o, say Hello?.

Expected behavior Response stops after first USER:

Logs

Not needed.

Additional context

I added 2 sections (from the GPT-4 config) to get it working properly:
stopwords:
  - "<|im_end|>"
  - "<|eot_id|>"
  - "</tool_call>"
  - "<|end_of_text|>"
  - "<dummy32000>"
  - "<|im_start|>"
  - "\nUSER:"
  - "\nASSISTANT:"
(not sure which were necessary)

Also, and not sure if the functioncall stuff should be there:
template:
  chat: |
    {{.Input -}}
    <|im_start|>assistant
  chat_message: |
    <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
    {{- if .FunctionCall }}
    <tool_call>
    {{- else if eq .RoleName "tool" }}
    <tool_response>
    {{- end }}
    {{- if .Content}}
    {{.Content }}
    {{- end }}
    {{- if .FunctionCall}}
    {{toJson .FunctionCall}}
    {{- end }}
    {{- if .FunctionCall }}
    </tool_call>
    {{- else if eq .RoleName "tool" }}
    </tool_response>
    {{- end }}<|im_end|>
  completion: |
    {{.Input}}
I am happy to open a PR if you agree. Or, if this works on CPU, understand why GPU image doesn't.

Sorry for the late reply. Would be great to have a PR from you. I'm open to testing edits

Feb 26 '25 20:02 M0Rf30

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Jun 23 '25 02:06 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Jun 29 '25 02:06 github-actions[bot]