ggllm.cpp Even in interactive mode, multiturn conversation is not possible.

Even in interactive mode, multiturn conversation is not possible.

Open ehalit opened this issue 1 year ago • 3 comments

Thanks for the wonderful work!

I am running the falcon-7b-instruct model with falcon_main, I generated the appropriate model with the conversion script and from warning messages, I can tell it is in the old format. Anyway, it runs perfectly fine for the given prompt but I cannot continue the chat after the model generates its output, even in the interactive mode. Since there will be a significant time overhead due to GPU offloading every time the falcon_main script runs, I would like to have multiturn conversations in a single run. Is there a way to achieve that?

Jul 17 '23 13:07 ehalit

ggllm.cpp ggllm.cpp copied to clipboard

Even in interactive mode, multiturn conversation is not possible.

ggllm.cpp
ggllm.cpp copied to clipboard