llama.cpp
llama.cpp copied to clipboard
model's reply is incomplete with setting nlen to 64
why set the nlen to 64 in Llm.kt in llama.android project? This parameter setting limits the length of the model's reply, that is, the current reply result is incomplete. But when i change the nlen value, i find the value is too hard to define, why the EOS can not work but only depends on the nlen? Thanks you for help.
You can easily modify the example to check for EOS token and stop
@ggerganov Yep, i check the EOS token, but not work, now i am digging into it.
This issue was closed because it has been inactive for 14 days since being marked as stale.