Feature request: Add built-in support for conversational API calls with context
Some people expect APIs to behave like chatgpt to keep conversation in memory between API calls in a session, and to their surprise, this is often not built in in many models. A general approach to implement conversational api calls like in a chatbot is to include the whole chat history in prompting and RAGing it.
I think it would be helpful to add a model agnostic interface to serve as persistent layer and RAG processor, it would same a lot of wheel re-inventing, and easier to blend in model support when there are any.
related issue: https://github.com/theodo-group/LLPhant/issues/45
more related issues in other projects: https://github.com/ollama/ollama/issues/4374 https://community.openai.com/t/how-do-you-maintain-historical-context-in-repeat-api-calls/34395