LocalAI I try to run DeepSeek-Coder-V2-Lite-Instruct-GGUF but doesn't work

Is this a llama-cpp version issue?

Jun 21 '24 15:06 diegottt

maybe https://github.com/ggerganov/llama.cpp/issues/7979 ?

Jun 21 '24 18:06 Nold360

I disabled flash attention and try to change the batch size. Now I can load the model but I have this output: end_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_i

Jun 22 '24 10:06 diegottt

I disabled flash attention and try to change the batch size. Now I can load the model but I have this output: end_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_id_i

probably related to a template you are using for this model, check your model file yaml, it should be something like this:

name: code-13b
context_size: 4096
f16: false # true to GPU acceleration
cuda: false # true to GPU acceleration
gpu_layers: 0 # this model have max 40 layers, 15-20 is reccomended for half-load at NVIDIA 4060 TiTan (more layers -- more VRAM required), (i guess 0 is no GPU)
parameters:
  model: code-13b.Q5_K_M.gguf
stopwords: 
- "</s>"
template:

  chat: &template |
    Below is an instruction that describes a task. Write a response that appropriately completes the request.
    Instruction: {{.Input}}
    Response:
  # Modify the prompt template here ^^^ as per your requirements
  completion: *template

in stop words you should input your models stop words

you can find them in logs of llama.cpp when it tries to load model in first time

Jul 06 '24 16:07 JackBekket

Also cannot get this to work. Downloaded the model from the gallery using the gui /browse. This downloaded the model and a deepseek-coder-v2-lite-instruct.yaml with the contents below.

context_size: 8192
mmap: true
name: deepseek-coder-v2-lite-instruct
parameters:
  model: DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf
stopwords:
- <  ^|end ^v^aof ^v^asentence  ^|>
template:
  chat: |
    {{.Input -}}
    Assistant: # Space is preserved for templating reasons, but line does not end with one for the linter.
  chat_message: |-
    {{if eq .RoleName "user" -}}User: {{.Content }}
    {{ end -}}
    {{if eq .RoleName "assistant" -}}Assistant: {{.Content}}<  ^|end ^v^aof ^v^asentence  ^|>{{end}}
    {{if eq .RoleName "system" -}}{{.Content}}
    {{end -}}
  completion: |
    {{.Input}}

Model is loaded perfectly, however output looks like this:

ating linter linterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterlinterl...

So I am guessing its a template problem. Any ideas on how to run this model?

Oct 14 '24 15:10 devwearsprada

The same issue, 'deepseek-coder-v2-lite-instruct' from models repository is not usable.

BTW Does anyone have a guide how to 'translate' Ollama template to use in LocalAI?

Nov 02 '24 16:11 rabner

same, i can't get this to work. it would be great to open up the configs so that we can edit them in the UI instead of having to create a new gallery. @mudler what you've made here is super powerful, but more customization would be greatly appreciated, and would likely reduce the number of issues like this one. in my opinion, at least part of the reason for a UI like this is so that users don't have to worry about or write code. but.. if we want to run different models, we have to go into the code. kind of defeats the purpose.

Nov 26 '24 22:11 j4ys0n

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Aug 08 '25 02:08 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Aug 21 '25 02:08 github-actions[bot]