llama.cpp Add chatLLaMa script

I'm not sure if this has a place in the repository. I did a bit of prompt engineering to get a conversation going with LLaMa, this is the script I use, which can serve as an example and "quickstart" to new users.

Mar 16 '23 07:03 j3k0

@j3k0 Nice script for new user like me. Just curious, with your chat history context, will LLaMa continue the conversation?

Mar 16 '23 10:03 pjq

In very rare instances, LLaMa will decide to end the conversation. It was occurring way more often before I told it it's a "never ending conversation" and a "10000 pages year long dialog".

I'm quite happy, even impressed with the result. Note that the examples I gave it (cat, Moscow) are excerpts from Wikipedia, I found that it reinforced LLaMa to use similar content as a source and tone for the answers, instead of the reddit/forums tone it had a tendency to produce.

Mar 16 '23 11:03 j3k0

Very nice, I am using your script to avoid it auto end the conversation.

Mar 16 '23 15:03 pjq

This script could benefit from static analysis, please use https://www.shellcheck.net/ to make it more robust

Mar 16 '23 23:03 D0han

the entirety of it is shared below.

is there trickery here? I'd (as a human) see this and expect not to reply / talk at all anymore and I am reading a past experience?

I assume you had to add this to get it to keep staying on or something?

Mar 17 '23 05:03 G2G2G2G

@G2G2G2G The language model just tries to continue the text and make it self consistent. If you start with "here are 10 step to do XYZ. Step 1, do X. Step 2", then it will auto-complete it until it generated those 10 steps.

In this script, insisting in different ways that what comes below is a very long dialog, that never ends (etc.), reduces the likelihood that the auto-completion decides that the dialog is over ([end of text]). This still happens, but way less.

At that point it's just dark magic, I didn't do any statistical analysis or whatever to find the best prompt, just solving issues I experience with trial and error.

Mar 17 '23 06:03 j3k0

Well I think I was clear that I did understand that. But the specific text I outlined seems to suggest it's already over and to end it lol

anyway I thought issue 71 is the main reason stuff ends early. (Your script doesn't help there, it still exits right at the same amount of text cuz of the token count max =[ )

Mar 17 '23 06:03 G2G2G2G

@D0han done

Mar 17 '23 06:03 j3k0

@G2G2G2G You can increase --n_predict and get longer output, but it will end when out of state memory (can be increased with ctx_size). However I think issue 71 refers to what I'm trying to prevent here: the interaction is often ended by the model before reaching any limits (with this special [end of text] token). I played around in main.cpp to prevent it and force-insert reverse_prompt tokens when end of text is generated by the model, however the models internal state becomes inconsistent at that stage (it forgot what it was doing), so there is no point.

Mar 17 '23 06:03 j3k0

Great script, thank you! I had been trying to build my own chat script and your prompt seems to be the key!

Mar 17 '23 22:03 mdonaberger

@j3k0 ah I see, thanks for the input.

inconsistent at that stage (it forgot what it was doing), so there is no point.

wow maybe it has dementia =[

Mar 18 '23 02:03 G2G2G2G

How about making a directory named contrib and put the script there? It could be a place to put other related and useful tools.

Mar 18 '23 18:03 thement

How about making a directory named contrib and put the script there? It could be a place to put other related and useful tools.

Good Idea. but let's keep it consistent with whisper.cpp. There is the examples directory.

https://github.com/ggerganov/whisper.cpp/tree/master/examples

Mar 18 '23 19:03 Green-Sky

May I suggest increasing n_predict to 2048? From my understanding having -n 1024 more or less limits the context size to that number and chat sessions will cut off sooner basically. https://github.com/ggerganov/llama.cpp/issues/266#issuecomment-1475249035 Again I may be wrong but I do know setting -n 2048 allows for effectively twice the session length before it reaches a hard stop. I'll try to do more testing on this. If I lower it to -n 96 I can get only 3 dog facts listed off before it comes to a hard stop. Regardless of how high -c is.

Mar 19 '23 14:03 rabidcopy

May I suggest increasing n_predict to 2048?

Done and rebased my branch onto master.

Mar 19 '23 16:03 j3k0

llama.cpp llama.cpp copied to clipboard

Add chatLLaMa script

llama.cpp
llama.cpp copied to clipboard