Switch to `llm` for LLM abstraction layer
Currently, microllama uses openai and langchain directly for interacting with language models and managing embeddings/vector stores. We should investigate switching to Simon Willison's llm library (https://llm.datasette.io/) as a more general abstraction layer. This could potentially simplify the codebase, support more models, and leverage the features provided by llm.
(comment authored by Gemini 2.5 Pro)
Okay, here's a revised plan focusing on abstracting the chat model provider using llm while keeping Langchain for embeddings/indexing for now:
-
Dependencies:
- Add
llmtopyproject.toml. - Add at least one
llmplugin for a chat model, e.g.,llm-openai. (We can add others later or instruct the user). - Keep
langchain,openai,faiss-cpu(or similar),tiktokenas they are still needed for the embedding/RAG pipeline. - Install/update dependencies.
- Add
-
Refactor Chat Functions (
answer,streaming_answer):- Import
llm. - Get the desired chat model instance using
llm.get_model(MODEL).MODEL(from env var) would now refer to anllmmodel ID/alias (e.g., "gpt-3.5-turbo", "gpt-4", potentially "gemini-pro" if configured). - Replace the
openai.ChatCompletion.create(...)call with thellmmodel'sprompt()orchat()method (likelychat()given the use of system/user roles). - Adapt the
prompt_messagesstructure slightly if needed for thellmchat()method. - Update the streaming logic in
streaming_answerto iterate over the response stream provided by thellmmodel.
- Import
-
Configuration:
- Instruct users on how to set API keys using
llm keys set(e.g.,llm keys set openai ...). TheOPENAI_API_KEYenv var will still be needed bylangchainfor embeddings. - The
MODELenvironment variable remains relevant but now specifies the model forllmto use.
- Instruct users on how to set API keys using
-
No Changes to Indexing/Embeddings: Leave
create_documents_from_texts,get_text_chunks,get_index,find_similar_docsas they are. They will continue to uselangchainandOpenAIEmbeddings. -
Documentation/Instructions: Update
README,make_dockerfile, anddeploy_instructionsto mention the need forllm, how to set API keys viallm keys set ..., and that theOPENAI_API_KEYenvironment variable is still required for embeddings.
This approach isolates the change to the chat completion logic, achieving the goal of provider flexibility there, while deferring the more complex change of replacing the embedding/vector store pipeline.
(comment authored by Gemini 2.5 Pro)