llama.cpp model's reply is incomplete with setting nlen to 64

model's reply is incomplete with setting nlen to 64

Open AnswerZhao opened this issue 11 months ago • 2 comments

why set the nlen to 64 in Llm.kt in llama.android project? This parameter setting limits the length of the model's reply, that is, the current reply result is incomplete. But when i change the nlen value, i find the value is too hard to define, why the EOS can not work but only depends on the nlen? Thanks you for help.

Mar 17 '24 06:03 AnswerZhao

You can easily modify the example to check for EOS token and stop

Mar 17 '24 17:03 ggerganov

@ggerganov Yep, i check the EOS token, but not work, now i am digging into it.

Mar 20 '24 09:03 AnswerZhao

This issue was closed because it has been inactive for 14 days since being marked as stale.

May 05 '24 01:05 github-actions[bot]

llama.cpp llama.cpp copied to clipboard

model's reply is incomplete with setting nlen to 64

llama.cpp
llama.cpp copied to clipboard