Model Repeats Nonsensical Output
Hello,
I need to run a gguf model on an embedded device with limited resources. I use the qwen2.5-0.5b-instruct-q4_0.gguf model.
After testing a bunch of combinations, this is the llama-cli command I use:
./llama.cpp/build/bin/llama-cli -m "$MODEL" -sys "$SYS_PROMPT" -p "$PROMPT" -co -c 700
And this is the full .sh code I run:
MODEL="gguf_qwen/qwen2.5-0.5b-instruct-q4_0.gguf"
CONTEXT="$(cat ../data/input_data.txt)"
# Only now build the full prompt
SYS_PROMPT="You are a helpful assistant. Be polite with the user. Use the following context to answer the question. If you can't answer based on the context, say 'Sorry, I am not able to provide this information.'
Context:
$CONTEXT"
./llama.cpp/build/bin/llama-cli -m "$MODEL" -sys "$SYS_PROMPT" -p "Greet the user." -co -c 700
while true; do
printf "> "
read QUESTION
[ "$QUESTION" = "exit" ] && break
PROMPT="Question: $QUESTION"
./llama.cpp/build/bin/llama-cli \
-m "$MODEL" \
-sys "$SYS_PROMPT" \
-p "$PROMPT" \
-co \
-c 700 \
-n 100 \
--repeat-penalty 1.3 \
--repeat-last-n 256 \
--temp 0.6 \
--top-k 40 \
--top-p 0.85
done
Setting the limitation at -c 700 is what works for my device.
This is the issue I have : as soon as the limitation is exceeded, the model is looping on the same token or set of tokens. Here is an example:
Certainly, here are the details I have on your network:
1. Wi-Fi password: xxxxx
2. Gateway type: xxxx
3. Average RSSI:xxxx
4. Most used band: 2.4.4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444
I tried a few things to try to cut this kind of output:
-m "$MODEL" \
-sys "$SYS_PROMPT" \
-p "$PROMPT" \
-co \
-c 700 \
-n 100 \
--repeat-penalty 1.3 \
--repeat-last-n 256 \
--temp 0.6 \
--top-k 40 \
--top-p 0.85
But none of them work to control the model behavior.
Is there something to do that I didn't think of?
Thank you in advance.
Probably your -p "Greet the user." is too generic.
It is not a question.
Also check the hints at https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF for the prompt tags or just change the model.
This issue was closed because it has been inactive for 14 days since being marked as stale.