ggllm.cpp
ggllm.cpp copied to clipboard
Even in interactive mode, multiturn conversation is not possible.
Thanks for the wonderful work!
I am running the falcon-7b-instruct model with falcon_main
, I generated the appropriate model with the conversion script and from warning messages, I can tell it is in the old format. Anyway, it runs perfectly fine for the given prompt but I cannot continue the chat after the model generates its output, even in the interactive mode. Since there will be a significant time overhead due to GPU offloading every time the falcon_main script runs, I would like to have multiturn conversations in a single run. Is there a way to achieve that?