llama.cpp
llama.cpp copied to clipboard
Add chatLLaMa script
I'm not sure if this has a place in the repository. I did a bit of prompt engineering to get a conversation going with LLaMa, this is the script I use, which can serve as an example and "quickstart" to new users.
@j3k0 Nice script for new user like me. Just curious, with your chat history context, will LLaMa continue the conversation?
In very rare instances, LLaMa will decide to end the conversation. It was occurring way more often before I told it it's a "never ending conversation" and a "10000 pages year long dialog".
I'm quite happy, even impressed with the result. Note that the examples I gave it (cat, Moscow) are excerpts from Wikipedia, I found that it reinforced LLaMa to use similar content as a source and tone for the answers, instead of the reddit/forums tone it had a tendency to produce.
Very nice, I am using your script to avoid it auto end the conversation.
This script could benefit from static analysis, please use https://www.shellcheck.net/ to make it more robust
the entirety of it is shared below.
is there trickery here? I'd (as a human) see this and expect not to reply / talk at all anymore and I am reading a past experience?
I assume you had to add this to get it to keep staying on or something?
@G2G2G2G The language model just tries to continue the text and make it self consistent. If you start with "here are 10 step to do XYZ. Step 1, do X. Step 2", then it will auto-complete it until it generated those 10 steps.
In this script, insisting in different ways that what comes below is a very long dialog, that never ends (etc.), reduces the likelihood that the auto-completion decides that the dialog is over ([end of text]
). This still happens, but way less.
At that point it's just dark magic, I didn't do any statistical analysis or whatever to find the best prompt, just solving issues I experience with trial and error.
Well I think I was clear that I did understand that. But the specific text I outlined seems to suggest it's already over and to end it lol
anyway I thought issue 71 is the main reason stuff ends early. (Your script doesn't help there, it still exits right at the same amount of text cuz of the token count max =[ )
@D0han done
@G2G2G2G You can increase --n_predict
and get longer output, but it will end when out of state memory (can be increased with ctx_size). However I think issue 71 refers to what I'm trying to prevent here: the interaction is often ended by the model before reaching any limits (with this special [end of text]
token). I played around in main.cpp
to prevent it and force-insert reverse_prompt tokens when end of text is generated by the model, however the models internal state becomes inconsistent at that stage (it forgot what it was doing), so there is no point.
Great script, thank you! I had been trying to build my own chat script and your prompt seems to be the key!
@j3k0 ah I see, thanks for the input.
inconsistent at that stage (it forgot what it was doing), so there is no point.
wow maybe it has dementia =[
How about making a directory named contrib
and put the script there? It could be a place to put other related and useful tools.
How about making a directory named contrib and put the script there? It could be a place to put other related and useful tools.
Good Idea. but let's keep it consistent with whisper.cpp. There is the examples
directory.
https://github.com/ggerganov/whisper.cpp/tree/master/examples
May I suggest increasing n_predict to 2048? From my understanding having -n 1024
more or less limits the context size to that number and chat sessions will cut off sooner basically. https://github.com/ggerganov/llama.cpp/issues/266#issuecomment-1475249035 Again I may be wrong but I do know setting -n 2048
allows for effectively twice the session length before it reaches a hard stop. I'll try to do more testing on this. If I lower it to -n 96
I can get only 3 dog facts listed off before it comes to a hard stop. Regardless of how high -c
is.
May I suggest increasing n_predict to 2048?
Done and rebased my branch onto master.