alpaca.cpp The model can't seem to keep track of a conversation.

The program doesn't seem to "remember" what was said previously, so it's difficult to maintain conversational flow. This example was generated with the 13B model, but the same happens with the 7B one as well.

(Running on Windows 11 with WSL2 Ubuntu, weights downloaded from the provided magnet links)

Mar 18 '23 15:03 athu16

If you're willing to manually retype the conversation history, then you can get your question answered, like so:

Screenshot from 2023-03-18 11-11-05

Mar 18 '23 16:03 salmon-coder

If you're willing to manually retype the conversation history, then you can get your question answered, like so:

Thanks! I guess that'll do for now. Hoping that it is integrated within the program itself... I don't think the original llama.cpp repo has this issue.

Mar 18 '23 16:03 athu16

After playing around with it some more, I'm somewhat more confused -- but I no longer think that the model doesn't have 'conversational memory'.

Also, the chat.cpp file is identical in this repo vs the one it was forked from, so that suggests that the chat logic is the same

Screenshot from 2023-03-18 17-28-39

Yet even if it can sometimes 'remember previous conversation', it does so only very intermittently, so imo your original report is basically correct, there is a lot of engineering work we can do here to improve the model's conversational memory

Mar 19 '23 19:03 salmon-coder

I am working on a version that more explicitly conveys the idea to Llama that there is a single-threaded conversation and its job is only to respond to the user. Curious whether anybody else has made any kind of significant progress with this.

Mar 19 '23 19:03 salmon-coder

I have also seen a few cases of indisputable conversational memory across 2 or 3 separate questions, but it's been very rare. No time to work on this myself, unfortunately, but I look forward to seeing what folks come up with to make it a properly conversational tool.

Mar 20 '23 01:03 abrahambone

I guess the biggest problem will be - the "emulated" conversational memory, i.e. when you add the whole (or just summary of) your previous conversation as a part of your prompt, will quickly hit the limit of number of tokens this model can take as an input.

This video explains it quite nicely - https://www.youtube.com/watch?v=VW5LBavIfY4&feature=youtu.be

Mar 21 '23 08:03 kha84

I am working on a version that more explicitly conveys the idea to Llama that there is a single-threaded conversation and its job is only to respond to the user. Curious whether anybody else has made any kind of significant progress with this.

https://github.com/deep-diver/Alpaca-LoRA-Serve

Implements a functional context system and has a demo running on a cloud instance which shows promise. My local testing shows that alpaca.cpp looks like it doesn't remember history, which makes me confused about the -c and --ctx_size params for alpaca.cpp because they clearly don't work. Their(LoRA-Serve) implementation is targeted towards CPUs with the VRAM capacity to run these models, unlike the CPU based alpaca.cpp. Seeing it refactored for CPU applications would be nice.

Mar 27 '23 08:03 dan-dean

alpaca.cpp alpaca.cpp copied to clipboard

The model can't seem to keep track of a conversation.

alpaca.cpp
alpaca.cpp copied to clipboard