llama.cpp Model Repeats Nonsensical Output

Hello,

I need to run a gguf model on an embedded device with limited resources. I use the qwen2.5-0.5b-instruct-q4_0.gguf model.

After testing a bunch of combinations, this is the llama-cli command I use: ./llama.cpp/build/bin/llama-cli -m "$MODEL" -sys "$SYS_PROMPT" -p "$PROMPT" -co -c 700

And this is the full .sh code I run:

MODEL="gguf_qwen/qwen2.5-0.5b-instruct-q4_0.gguf"
CONTEXT="$(cat ../data/input_data.txt)"

# Only now build the full prompt
SYS_PROMPT="You are a helpful assistant. Be polite with the user. Use the following context to answer the question. If you can't answer based on the context, say 'Sorry, I am not able to provide this information.'

Context:
$CONTEXT"

./llama.cpp/build/bin/llama-cli -m "$MODEL" -sys "$SYS_PROMPT" -p "Greet the user." -co -c 700

while true; do
  printf "> "
  read QUESTION
  [ "$QUESTION" = "exit" ] && break

  PROMPT="Question: $QUESTION"
  ./llama.cpp/build/bin/llama-cli \
    -m "$MODEL" \
    -sys "$SYS_PROMPT" \
    -p "$PROMPT" \
    -co \
    -c 700 \
    -n 100 \
    --repeat-penalty 1.3 \
    --repeat-last-n 256 \
    --temp 0.6 \
    --top-k 40 \
    --top-p 0.85



done

Setting the limitation at -c 700 is what works for my device.

This is the issue I have : as soon as the limitation is exceeded, the model is looping on the same token or set of tokens. Here is an example:

Certainly, here are the details I have on your network:
1. Wi-Fi password: xxxxx
2. Gateway type: xxxx
3. Average RSSI:xxxx
4. Most used band: 2.4.4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444

I tried a few things to try to cut this kind of output:

    -m "$MODEL" \
    -sys "$SYS_PROMPT" \
    -p "$PROMPT" \
    -co \
    -c 700 \
    -n 100 \
    --repeat-penalty 1.3 \
    --repeat-last-n 256 \
    --temp 0.6 \
    --top-k 40 \
    --top-p 0.85

But none of them work to control the model behavior.

Is there something to do that I didn't think of?

Thank you in advance.

Apr 22 '25 14:04 18marie05

Probably your -p "Greet the user." is too generic.
It is not a question.

Also check the hints at https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF for the prompt tags or just change the model.

Apr 22 '25 17:04 Manamama

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jun 07 '25 01:06 github-actions[bot]